持续演讲人对文本到语音合成的适应 (Continual Speaker Adaptation for Text-to-Speech Synthesis)

Training a multi-speaker Text-to-Speech (TTS) model from scratch is computationally expensive and adding new speakers to the dataset requires the model to be re-trained. The naive solution of sequential fine-tuning of a model for new speakers can cause the model to have poor performance on older speakers. This phenomenon is known as catastrophic forgetting. In this paper, we look at TTS modeling from a continual learning perspective where the goal is to add new speakers without forgetting previous speakers. Therefore, we first propose an experimental setup and show that serial fine-tuning for new speakers can result in the forgetting of the previous speakers. Then we exploit two well-known techniques for continual learning namely experience replay and weight regularization and we reveal how one can mitigate the effect of degradation in speech synthesis diversity in sequential training of new speakers using these methods. Finally, we present a simple extension to improve the results in extreme setups.

翻译：从零开始培训多讲者文本到语音(TTS)模式在计算上是昂贵的,在数据集中增加新的演讲者需要重新培训模式。对新演讲者模式进行顺序微调的天真解决办法可能导致新演讲者模式的性能不佳。这一现象被称为灾难性的遗忘。在本文中,我们从不断学习的角度看待TTS模型,目的是在不忘前几位演讲者的情况下增加新的演讲者。因此,我们首先提出一个实验设置,并表明对新演讲者进行序列微调可能导致遗忘前几位演讲者。然后,我们利用两种众所周知的不断学习技术,即经验重放和重量正规化,我们揭示在使用这些方法对新演讲者进行连续培训时如何减轻语音合成多样性退化的影响。最后,我们提出一个简单的扩展,以改进极端设置的结果。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日