Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with token-level language posteriors which are outputs of a sequence-to-sequence auxiliary language diarization module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct the experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. The comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.
翻译:代码转换(CS)是指一种现象,即语言在语言信号内转换,导致语言混乱,自动语音识别(ASR),本文件旨在从两个角度解决语言混乱问题,以改进CS-ASR:纳入和分离语言信息。我们将语言信息纳入CS-ASR模式,以动态方式将该模式与象征性语言后遗语相挂钩,这是顺序到顺序辅助语言分化模块的产物。相反,脱钩过程通过对抗性培训减少语言之间的差异,使两种语言正常化。我们进行了SEAME数据集实验。与基线模型相比,与LD的联合优化和语言后遗语偏差都取得了绩效改进。对拟议方法的比较表明,将语言信息纳入比在CS演讲中减少语言混淆更有效。