Voice conversion (VC) has been proposed to improve speech recognition systems in low-resource languages by using it to augment limited training data. But until recently, practical issues such as compute speed have limited the use of VC for this purpose. Moreover, it is still unclear whether a VC model trained on one well-resourced language can be applied to speech from another low-resource language for the purpose of data augmentation. In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition. Concretely, we combine several recent techniques to design and train a practical VC system in English, and then use this system to augment data for training a speech recognition model in several low-resource languages. We find that when using a sensible amount of augmented data, speech recognition performance is improved in all four low-resource languages considered.
翻译:为了改进低资源语言的语音识别系统(VC),有人提议通过使用它来增加有限的培训数据来改进低资源语言的语音识别系统。但直到最近,诸如计算速度等实际问题限制了为此目的对VC的使用。此外,目前还不清楚的是,为扩大数据的目的,是否可以将受过一种资源丰富的语言培训的VC模式用于使用另一种低资源语言的语音。在这项工作中,我们评估是否可以用跨语言使用VC系统来改进低资源语言的识别。具体地说,我们结合了最近的一些技术来设计和培训一个实用的VC系统,然后利用这个系统来增加数据,用于培训几种低资源语言的语音识别模型。我们发现,在使用合理数量的强化数据时,所有四种低资源语言的语音识别表现都得到了改进。