Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, existing methods have no language constraints on LSEs and underutilize language-specific knowledge of LSMs. In this paper, we propose a language-specific characteristic assistance (LSCA) method to mitigate the above problems. Specifically, during training, we introduce two language-specific losses as language constraints and generate corresponding language-specific targets for them. During decoding, we take the decoding abilities of LSMs into account by combining the output probabilities of two LSMs and the mixture model to obtain the final predictions. Experiments show that either the training or decoding method of LSCA can improve the model's performance. Furthermore, the best result can obtain up to 15.4% relative error reduction on the code-switching test set by combining the training and decoding methods of LSCA. Moreover, the system can process code-switching speech recognition tasks well without extra shared parameters or even retraining based on two pre-trained LSMs by using our method.
翻译:双编码器结构成功地利用了两种语言专用编码器(LSE)来识别调译语音。由于LSE是由两个经过预先训练的特定语言模型(LSMS)初始化的,因此双编码器结构可以利用足够的单语数据并捕捉个别语言属性,但是,现有的方法对LSE没有语言限制,并且没有充分利用LSM语言特定语言知识。在本文件中,我们提出了一种具体语言特有协助(LSCA)的方法来缓解上述问题。具体地说,在培训期间,我们引入了两种语言特有损失作为语言限制,并产生了相应的特定语言目标。在解码过程中,我们将LSM的解码能力纳入考虑,将两个LSM和混合模型的产出概率结合起来,以获得最后预测。实验表明,LSCA的培训或解译方法可以改进模型的性能。此外,通过将培训和解译前语言规范方法与LASA的系统前标准合并起来,可以使代码减少15.4%的编码的误差。此外,通过不使用我们经过良好调整的LASAASAA的校准方法,也可以使用两种不同的校正的校正的校准方法,将LSMLSAS。