The success in designing Code-Switching (CS) ASR often depends on the availability of the transcribed CS resources. Such dependency harms the development of ASR in low-resourced languages such as Bengali and Hindi. In this paper, we exploit the transfer learning approach to design End-to-End (E2E) CS ASR systems for the two low-resourced language pairs using different monolingual speech data and a small set of noisy CS data. We trained the CS-ASR, following two steps: (i) building a robust bilingual ASR system using a convolution-augmented transformer (Conformer) based acoustic model and n-gram language model, and (ii) fine-tuned the entire E2E ASR with limited noisy CS data. We tested our method on MUCS 2021 challenge and achieved 3rd place in the CS track. We then tested the proposed method using noisy CS data released for Hindi-English and Bengali-English pairs in Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages (MUCS 2021) and achieved 3rd place in the CS track. Unlike, the leading two systems that benefited from crawling YouTube and learning transliteration pairs, our proposed transfer learning approach focused on using only the limited CS data with no data-cleaning or data re-segmentation. Our approach achieved 14.1% relative gain in word error rate (WER) in Hindi-English and 27.1% in Bengali-English. We provide detailed guidelines on the steps to finetune the self-attention based model for limited data for ASR. Moreover, we release the code and recipe used in this paper.
翻译:设计代码转换(CS) ASR的成功往往取决于调试 CS(CS) 资源是否到位。这种依赖性会损害以孟加拉语和印地语等低资源语言开发ASR。在本文中,我们利用调试方法设计了两对低资源语言对口(E2E) CS ASR系统,使用不同的单语语音数据和一小组噪音 CS数据。我们用两个步骤对CS-ASR进行了培训:(一) 使用基于调试的变压器(Conx),建立强大的双语ASR系统。这种依赖性会损害以低资源语言(COS) 的音响式变压模型和 n-gram语言模型等低资源语言开发ASR;以及(二) 精细调整整个ESR(E2E ASR),使用有限的CS数据。我们用MSCS 2021 测试了我们的方法,在CS轨道上获得了第三位。然后我们测试了在多语言和孟加拉语对口语和孟加拉英语对口(CS-CS-CS-CS-S) 解解解解算方法(我们低资源印度语言(MUS 2021) 的精确格式格式格式模式和移动数据升级方法中,在使用2个数据转换数据升级数据中只读取了我们的数据中只读取了我们的数据。