Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific. Training multilingual system for Indic languages is even more tougher due to lack of open source datasets and results on different approaches. We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID). The decoding information from a multilingual model is used for language identification and then combined with monolingual models to get an improvement of 50% WER across languages. We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively. Our work talks on how transformer based ASR especially wav2vec 2.0 can be applied in developing multilingual ASR and code switched ASR for Indic languages.
翻译:多语言自动语音识别培训系统具有挑战性,因为语音和词汇信息通常是语言专用的。由于缺乏开放源数据集和不同方法的结果,印度语的多语言培训系统更加困难。我们把结束多语言语音识别系统的绩效与以语言识别为条件的单一语言模式的绩效进行比较。多语言模式的解码信息用于语言识别,然后与单一语言模式相结合,使不同语言的WER提高50%。我们还提议了一种类似技术,以解决《守则》转换的问题,并实现对印地语英语和孟加拉语英语的WER21.77和28.27。我们的工作讲座是如何在开发多语言的ASR特别是wav2vec 2.0中应用变压器,将ASR代码转换为印度语的代码。