Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech tasks. We extend previous self-supervised work on language identification by experimenting with pre-trained models which were learned on real-world unconstrained speech in multiple languages and not just on English. We show that models pre-trained on many languages perform better and enable language identification systems that require very little labeled data to perform well. Results on a 25 languages setup show that with only 10 minutes of labeled data per language, a cross-lingually pre-trained model can achieve over 93% accuracy.
翻译:语言识别对诸如自动语音识别等下游任务的成功影响很大。 最近,Wav2vec 2.0所学的自我监督的演讲演示已证明对一系列演讲任务非常有效。 我们通过试验在以多种语言而不是仅英语进行的真实世界不受限制的演讲中学习的预先培训模式,扩展了先前关于语言识别的自我监督工作。 我们表明,许多语言的预培训模式表现得更好,并使语言识别系统能够很好地运行。 25种语言设置的结果显示,每种语言只有10分钟的标签数据,跨语言的预先培训模式可以达到93%的准确度。