Code-switching (CS), defined as the mixing of languages in conversations, has become a worldwide phenomenon. The prevalence of CS has been recently met with a growing demand and interest to build CS ASR systems. In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR). We first contribute in filling the huge gap in resources by collecting, analyzing and publishing our spontaneous CS Egyptian Arabic-English speech corpus. We build our ASR systems using DNN-based hybrid and Transformer-based end-to-end models. In this paper, we present a thorough comparison between both approaches under the setting of a low-resource, orthographically unstandardized, and morphologically rich language pair. We show that while both systems give comparable overall recognition results, each system provides complementary sets of strength points. We show that recognition can be improved by combining the outputs of both systems. We propose several effective system combination approaches, where hypotheses of both systems are merged on sentence- and word-levels. Our approaches result in overall WER relative improvement of 4.7%, over a baseline performance of 32.1% WER. In the case of intra-sentential CS sentences, we achieve WER relative improvement of 4.8%. Our best performing system achieves 30.6% WER on ArzEn test set.
翻译:代码转换(CS)被定义为在对话中混合语言,已成为一种世界性的现象。计算机转换(CS)的普及最近随着建立 CS ASR 系统的需求和兴趣日益增长而得到了满足。我们在本文件中介绍了我们关于编码转换埃及阿拉伯文-英文自动语音识别(ASR)的工作。我们首先通过收集、分析和出版我们自发的CS埃及阿拉伯文-英文语音资料来填补资源的巨大差距。我们用基于DNN的混合和基于变异器的终端到终端模型来建立我们的自动转换系统。我们在本文件中对在低资源、正方形不标准化和形态丰富语言配对的设置下两种方法进行了彻底比较。我们表明,虽然两种系统都提供了可比的总体识别结果,但每个系统都提供了一套互补的优势点。我们提出了几种有效的系统组合方法,将两种系统的假设合并在判刑和字级上。我们的方法使WER相对4.7%的相对改进了我们CER系统内部30.1%的相对改进。在WER标准中实现了我们CER的30.