Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems. In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. We propose a simple pseudo-labeling recipe that works well even with low-resource languages: train a supervised multilingual model, fine-tune it with semi-supervised learning on a target language, generate pseudo-labels for that language, and train a final model using pseudo-labels for all languages, either from scratch or by fine-tuning. Experiments on the labeled Common Voice and unlabeled VoxPopuli datasets show that our recipe can yield a model with better performance for many languages that also transfers well to LibriSpeech.
翻译:通过假标签进行半监督的学习已成为最先进的单语语音识别系统的主机。 在这项工作中,我们将假标签推广到使用60种语言的大规模多语言语音识别。 我们提出一个简单的假标签配方,即使使用低资源语言也效果良好:培训一个受监督的多语种模式,以半监督的学习方式对它进行微调,为该语言制作假标签,并用从头到尾或微调的所有语言使用假标签来培训最后的模型。 对标签通用声音和非标签的VoxPopuli数据集的实验显示,我们的配方可以产生一种模式,使许多语言的性能更好,这些语言也能很好地传输到LibriSpeech。