We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos. TaL consists of two parts: TaL1 is a set of six recording sessions of one professional voice talent, a male native speaker of English; TaL80 is a set of recording sessions of 81 native speakers of English without voice talent experience. Overall, the corpus contains 24 hours of parallel ultrasound, video, and audio data, of which approximately 13.5 hours are speech. This paper describes the corpus and presents benchmark results for the tasks of speech recognition, speech synthesis (articulatory-to-acoustic mapping), and automatic synchronisation of ultrasound to audio. The TaL corpus is publicly available under the CC BY-NC 4.0 license.
翻译:我们展示了“舌声和嘴唇声声”(TAL),这是一个多语种的音频、超声波舌成像和唇语视频库。TAL由两部分组成:TAL1是一套由一位专业语音人才(英语男性母语)组成的六次录音会议;TAL80是一套81个英语本地人(没有声音才经验)的录音会议。总体而言,TAL包含24小时的平行超声波、视频和音频数据,其中约13.5小时为演讲时间。本文描述了声音识别、语音合成(人工合成)和超声波自动同步工作的基本结果,根据CC BY-NC 4.0的许可证,可公开查阅TAL声波。