Code-Switching (CS) is a common linguistic phenomenon in multilingual communities that consists of switching between languages while speaking. This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech. We analyse different CS specific issues such as the properties mismatches between languages in a CS language pair, the unpredictable nature of switching points, and the data scarcity problem. We exploit and improve the state-of-the-art end-to-end system by merging nonlinguistic symbols, by integrating language identification using hierarchical softmax, by modeling sub-word units, by artificially lowering the speaking rate, and by augmenting data using speed perturbed technique and several monolingual datasets to improve the final performance not only on CS speech but also on monolingual benchmarks in order to make the system more applicable on real life settings. Finally, we explore the effect of different language model integration methods on the performance of the proposed model. Our experimental results reveal that all the proposed techniques improve the recognition performance. The best combined system improves the baseline system by up to 35% relatively in terms of mixed error rate and delivers acceptable performance on monolingual benchmarks.
翻译:代码转换(CS)是多语种社区中常见的语言现象,包括语言之间在说话时转换。本文介绍了我们对普通话-英语CS 语言语言的端到端语音识别的调查。我们分析了不同的语言识别具体问题,如CS 语言对配、切换点的不可预测性和数据稀缺问题。我们利用并改进了最先进的终端到端系统,将非语言符号结合起来,采用等级软体来整合语言识别符号,制作副词单元模型,人为降低发言率,用速度过低技术和若干单语数据集来增加数据,不仅改进CS 语的最后功能,而且改进单语基准,以使系统更适用于真实生活环境。最后,我们探索了不同语言模式整合方法对拟议模型性能的影响。我们的实验结果表明,所有拟议的技术都提高了认知性。最佳组合系统改进了基线系统,从混合误差率的角度来说,相对来说,改进了35%的基线系统。