Pseudo-labeling (PL) has been shown to be effective in semi-supervised automatic speech recognition (ASR), where a base model is self-trained with pseudo-labels generated from unlabeled data. While PL can be further improved by iteratively updating pseudo-labels as the model evolves, most of the previous approaches involve inefficient retraining of the model or intricate control of the label update. We present momentum pseudo-labeling (MPL), a simple yet effective strategy for semi-supervised ASR. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The online model is trained to predict pseudo-labels generated on the fly by the offline model. The offline model maintains a momentum-based moving average of the online model. MPL is performed in a single training process and the interaction between the two models effectively helps them reinforce each other to improve the ASR performance. We apply MPL to an end-to-end ASR model based on the connectionist temporal classification. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios with varying amounts of data or domain mismatch.
翻译:在半监督自动语音识别(ASR)中,一个基础模型以未贴标签数据产生的假标签自我训练,对一个基础模型进行自我训练,使用非标签数据产生的假标签;虽然随着模型的演变,通过迭代更新伪标签可以进一步改进PL,但大多数以前的方法都涉及对模型进行低效再培训或对标签更新进行复杂的控制。我们展示了一种势头假标签(MPL),这是半监督的ASR简单而有效的战略。MPL由一对在线和离线模型组成,这些模型在平均教师方法的启发下相互作用和相互学习。在线模型经过培训,可以预测由离线模型产生的假标签。离线模型保持一种基于动力的在线模型移动平均数。MPL在一个单一的培训过程中进行,两个模型之间的相互作用有效地帮助它们相互加强,以提高ASR的性能。我们根据连接时间分类,将MPL应用于最终到终端的ASR模型。实验结果表明,MPL有效地改进了基础模型或半范围模型的不统一度,并且可以使模型的模型与半范围变异。