从暗数据用暗数据进行文字到语音合成,并选择评价在环内数据选择 (Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection)

This paper proposes a method for selecting training data for text-to-speech (TTS) synthesis from dark data. TTS models are typically trained on high-quality speech corpora that cost much time and money for data collection, which makes it very challenging to increase speaker variation. In contrast, there is a large amount of data whose availability is unknown (a.k.a, "dark data"), such as YouTube videos. To utilize data other than TTS corpora, previous studies have selected speech data from the corpora on the basis of acoustic quality. However, considering that TTS models robust to data noise have been proposed, we should select data on the basis of its importance as training data to the given TTS model, not the quality of speech itself. Our method with a loop of training and evaluation selects training data on the basis of the automatically predicted quality of synthetic speech of a given TTS model. Results of evaluations using YouTube data reveal that our method outperforms the conventional acoustic-quality-based method.

翻译：本文建议了从暗数据中选择文本到语音合成培训数据的方法。 TTS模型通常在高质量的语音组合方面受过培训,这种组合花费了大量的时间和金钱来收集数据,因此很难增加发言者的变异性。相比之下,有大量数据(a.k.a,“暗数据”),如YouTube视频等,其可用性未知(a.k.a,“暗数据”)。为了使用TTS Corbora以外的数据,以前的研究根据声学质量从Corpora中选择了语音数据。然而,考虑到已经提出了对数据噪音具有活力的 TTS模型,我们应该根据这些数据作为特定 TTS模型培训数据的重要性而不是语言质量本身来选择数据。我们通过培训和评价循环选择了培训数据,其依据是特定TTS模型合成语言的自动预测质量。使用YouTube数据的评价结果显示,我们的方法超过了传统的声学质量方法。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】盲图像超分辨率退化分布的研究，Learning the Degradation Distribution for Blind Image Super-Resolution

专知会员服务

7+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日