Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.
翻译:与单一语言系统相比,多语言端到端模式有了很大的改进。随着语言培训前方法的开发,像XLSR这样的自我监督的多语言语言语言代表学习在提高多语言自动语音识别(ASR)的绩效方面取得了成功。然而,与监督的学习一样,多语言的预培训也可能受到语言干扰,并会进一步影响到多语言系统的应用。在本文件中,我们引入了几种技术,通过利用辅助语言信息,包括语言对抗培训、语言嵌入和在培训前阶段的语言适应性培训,改进自我监督的多语言预培训。我们开展了由16种语言组成的多语言语言语言语言语言代表学习实验。我们的实验结果表明,比标准 XLSR模式获得14.3%的相对收益,比没有培训前多语言模式获得19.8%的相对收益。