Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. In this paper, we propose to use adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech adaptation. Based on our previous MetaAdapter that implicitly leverages adapters, we propose a novel algorithms called SimAdapter for explicitly learning knowledge from adapters. Our algorithm leverages adapters which can be easily integrated into the Transformer structure.MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters. We conduct extensive experiments on five-low-resource languages in Common Voice dataset. Results demonstrate that our MetaAdapter and SimAdapter methods can reduce WER by 2.98% and 2.55% with only 2.5% and 15.5% of trainable parameters compared to the strong full-model fine-tuning baseline. Moreover, we also show that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
翻译:跨语言语音适应旨在解决利用多种丰富资源语言为低资源目标语言建立模型的问题。 由于低资源语言的培训数据有限, 语音识别模型很容易过度使用。 在本文中, 我们提议使用适应器来调查多种适应器的性能, 以便进行参数效率高的跨语言语音适应。 根据我们以前的MetaAdapter, 隐含地影响调适器, 我们提议了一个叫SimAdapter 的新奇算法, 以明确从适应器中学习知识。 我们的算法将很容易融入变异器结构的调适器用于调器。 MetaAdapter 利用元学习将一般知识从培训数据转移到测试语言。 Simadapter 的目的是在使用适应器进行微调时学习源和目标语言之间的相似性能。 我们在通用语音数据集中对五种低资源语言进行广泛的实验。 结果表明, 我们的MetAdapter 和SimAdapter 方法可以将WER 降低2.98 % 和 2.55 % 的可训练参数与强的2.5 % 和15.5 %, 相比, 我们还可以用全模型进行更好的削减。