The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with monolingual data. In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. We based our approach on random lexical replacements and Equivalence Constraint (EC) while exploiting aligned translation pairs to generate random and grammatically valid CS content. Our empirical results show a 65.5% relative reduction in language model perplexity, and 7.7% in ASR WER on two ecologically valid CS test sets. The human evaluation of the generated text using EC suggests that more than 80% is of adequate quality.
翻译:口语内容的内地密码交换(CS)的普及要求语音识别(ASR)系统处理混合语言。设计 CS-ASR系统有许多挑战,主要原因是数据稀缺、语法结构复杂和域错配。处理 CS 的最常用方法是用现有的转录 CS 语音和单语数据来培训一个ASR系统。在这项工作中,我们建议CS-ASR采用零光学习方法,通过人工生成 CS 文本来增加单语数据。我们采用的方法是随机替换和等效约束(EC),同时利用对齐的翻译配方生成随机和有语法效果的 CS 内容。我们的经验结果表明,语言模型的不易懂性相对减少65.5%,在ASR WER的两个生态有效的 CS测试组中减少7.7%。用EC对产生的文字进行的人文评价表明,80%以上的质量是适当的。