FMFCC-A:一个挑战性普通话数据集,用于合成语音探测 (FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection)

As increasing development of text-to-speech (TTS) and voice conversion (VC) technologies, the detection of synthetic speech has been suffered dramatically. In order to promote the development of synthetic speech detection model against Mandarin TTS and VC technologies, we have constructed a challenging Mandarin dataset and organized the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A). The FMFCC-A dataset is by far the largest publicly-available Mandarin dataset for synthetic speech detection, which contains 40,000 synthesized Mandarin utterances that generated by 11 Mandarin TTS systems and two Mandarin VC systems, and 10,000 genuine Mandarin utterances collected from 58 speakers. The FMFCC-A dataset is divided into the training, development and evaluation sets, which are used for the research of detection of synthesized Mandarin speech under various previously unknown speech synthesis systems or audio post-processing operations. In addition to describing the construction of the FMFCC-A dataset, we provide a detailed analysis of two baseline methods and the top-performing submissions from the FMFCC-A, which illustrates the usefulness and challenge of FMFCC-A dataset. We hope that the FMFCC-A dataset can fill the gap of lack of Mandarin datasets for synthetic speech detection.

翻译：随着文字语音技术(TTS)和语音转换(VC)技术的不断发展,合成语言的探测受到极大打击,为了推动开发针对普通话、TTS和VC技术的合成语音检测模型,我们建造了一个具有挑战性的普通话数据集,并同时组织了中国图像和图像学会首次假媒体法证挑战的音频跟踪。FMFCC-A数据集是迄今为止用于合成语音检测的最大公开数据集,其中包括由11个曼达林TTTS系统和2个曼达林VC系统以及从58个发言者那里收集的10 000个真正普通话的合成语音检测模型。FMFCC-A数据集分为培训、开发和评估数据集,用于在各种以前未知的语音合成合成合成系统或音频后处理操作下对合成曼达林语声音进行检测的研究。除了描述FMFCC-A数据集的构建情况外,我们还详细分析了两种基线方法,以及FMFCAS检测工具的高级版本,说明FMAC数据的缺乏。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。