Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR). Our prior work proposed momentum pseudo-labeling (MPL), which performs PL-based SSL via an interaction between online and offline models, inspired by the mean teacher framework. MPL achieves remarkable results on various semi-supervised settings, showing robustness to variations in the amount of data and domain mismatch severity. However, there is further room for improving the seed model used to initialize the MPL training, as it is in general critical for a PL-based method to start training from high-quality pseudo-labels. To this end, we propose to enhance MPL by (1) introducing the Conformer architecture to boost the overall recognition accuracy and (2) exploiting iterative pseudo-labeling with a language model to improve the seed model before applying MPL. The experimental results demonstrate that the proposed approaches effectively improve MPL performance, outperforming other PL-based methods. We also present in-depth investigations to make our improvements effective, e.g., with regard to batch normalization typically used in Conformer and LM quality.
翻译:一种半监督的学习(SSL)方法,即种子模型使用未经调试的语音生成的假标签进行自我培训,这种半监督的学习方法(PL)已被证明能够提高终端到终端自动语音识别(ASR)的性能。我们先前的工作提议了一种动力化的假标签(MPL),通过在平均教师框架的启发下,通过在线和离线模式之间的互动来进行基于PL的 SSL 。MPL在各种半监督的环境下取得了显著的成果,显示出数据数量和域差严重性的变化的稳健性。然而,还有进一步改进用于启动MPL培训的种子模型的空间,因为一般来说,对于基于PL启动高质量假标签的培训至关重要。为此,我们提议加强MPL,办法是:(1) 引入Conder架构,以提高总体认知准确性,(2) 利用一种语言模型来改进种子模型,然后应用MPL。实验结果表明,拟议的MPL绩效方法有效改进了MPL,优于其他基于PL的方法。我们通常在深度调查中采用的方法。