低资源语音文字语音语音扩增 (Distribution augmentation for low-resource expressive text-to-speech)

Mateusz Lajszczak,Animesh Prasad,Arent van Korlaar,Bajibabu Bollepalli,Antonio Bonafonte,Arnaud Joly,Marco Nicolis,Alexis Moinet,Thomas Drugman,Trevor Wood,Elena Sokolova

from arxiv, ICASSP 2022: camera-ready

This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available during training. This helps to reduce overfitting, especially in low-resource settings. Our method relies on substituting text and audio fragments in a way that preserves syntactical correctness. We take additional measures to ensure that synthesized speech does not contain artifacts caused by combining inconsistent audio samples. The perceptual evaluations show that our method improves speech quality over a number of datasets, speakers, and TTS architectures. We also demonstrate that it greatly improves robustness of attention-based TTS models.

翻译：本文介绍了一种用于文本到语音的新颖的数据增强技术(TTS),这种技术可以产生新的(文本、音频)培训范例,而不需要任何额外数据。我们的目标是在培训期间增加现有的文字条件的多样性。这有助于减少过度配制,特别是在资源低的环境下。我们的方法依靠的是替换文本和音频碎片,以保持同步正确性。我们采取了额外措施,确保合成语言不包含由不一致的音频样本混合而成的手工艺品。感知性评估表明,我们的方法提高了许多数据集、演讲人和TTS结构的语音质量。我们还表明,它大大改善了基于关注的TTS模型的稳健性。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日