TAL：超声舌成像，音频和唇部视频的同步多扬声器语料库

论文标题

TAL：超声舌成像，音频和唇部视频的同步多扬声器语料库

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

论文作者

Ribeiro, Manuel Sam, Sanger, Jennifer, Zhang, Jing-Xuan, Eshky, Aciel, Wrench, Alan, Richmond, Korin, Renals, Steve

论文摘要

我们介绍舌头和嘴唇语料库（TAL），这是音频，超声舌成像和唇部视频的多扬声器语料库。 TAL由两个部分组成：TAL1是一组六个专业人才的录音会议，是一位以英语为英语的男性母语者； TAL80是一系列录制的录音会议，其中包括81位没有语音人才经验的英语母语者。总体而言，该语料库包含24小时的平行超声，视频和音频数据，其中大约13.5小时是语音。本文描述了语料库，并为语音识别，语音综合（宣传到声学映射）以及超声与音频的自动同步提供了基准结果。根据CC BY-NC 4.0许可，TAL语料库公开可用。

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos. TaL consists of two parts: TaL1 is a set of six recording sessions of one professional voice talent, a male native speaker of English; TaL80 is a set of recording sessions of 81 native speakers of English without voice talent experience. Overall, the corpus contains 24 hours of parallel ultrasound, video, and audio data, of which approximately 13.5 hours are speech. This paper describes the corpus and presents benchmark results for the tasks of speech recognition, speech synthesis (articulatory-to-acoustic mapping), and automatic synchronisation of ultrasound to audio. The TaL corpus is publicly available under the CC BY-NC 4.0 license.

下载PDF全文

下载文献需遵守相关版权规定

论文标题