This paper describes our NPU-ASLP system submitted to the ISCSLP 2022 Magichub Code-Switching ASR Challenge. In this challenge, we first explore several popular end-to-end ASR architectures and training strategies, including bi-encoder, language-aware encoder (LAE) and mixture of experts (MoE). To improve our system's language modeling ability, we further attempt the internal language model as well as the long context language model. Given the limited training data in the challenge, we further investigate the effects of data augmentation, including speed perturbation, pitch shifting, speech codec, SpecAugment and synthetic data from text-to-speech (TTS). Finally, we explore ROVER-based score fusion to make full use of complementary hypotheses from different models. Our submitted system achieves 16.87% on mix error rate (MER) on the test set and comes to the 2nd place in the challenge ranking.
翻译:本文描述了我们提交给ICSLP 2022 Magichub 代码转换 ASR 挑战的 NPU-ASLP 系统。 在这项挑战中,我们首先探索了几个受欢迎的终端到终端 ASR 架构和培训战略,包括双编码器、语言读数编码器和专家混合(MOE) 。为了提高我们系统的语言建模能力,我们进一步尝试内部语言模型和长背景语言模型。鉴于这项挑战中培训数据有限,我们进一步调查了数据增强的影响,包括快速扰动、音速移动、语音编码、分数和文本到语音的合成数据。最后,我们探索了基于“基于覆盖的分数”的组合和合成数据,以充分利用不同模型的互补假设。我们提交的系统在测试集中的混合误差率上达到了16.87%,并在挑战排名中位居于第2位。