Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or $n$-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. We name this approach RescoreBERT. On the LibriSpeech corpus, it reduces WER by 6.6%/3.4% relative on clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.
翻译:二流语言校准是自动语音识别(ASR)系统的一个重要组成部分,该系统用来通过实施拉蒂斯重新校准或以美元为美元最优的重新排名来改进一级解码器的输出。在使用隐蔽语言模型(MLM)目标的预培训中,各种自然语言理解(NLU)任务取得了巨大成功,但作为ASR的重新校准模式,它并没有获得牵引力。具体地说,对像BERT这样的双向模式进行双向模型的培训,其歧视性目标如最低WER(MWER),没有被探索。在这里,我们展示了如何用MWERT重新校准模型(MER)损失来培训一个基于BERT的重新校准模型,将歧视损失的改进纳入深双向预先培训模式(MLMM)的精细调整中。具体地说,我们提议了一项融合战略,将MLMM纳入歧视培训培训过程,以便有效地从一个未经培训的模式中提取知识。我们进一步提议了一种替代性的歧视性损失。我们称之为“RescoreBER。在利Speechamp上找到一个超越利Spechamp上,它以6.3.3.4S(We real)一个比目标为6.8Rireal)一个比比标准,它减少一个目标8Real-exximal-ebroducro)一个比一个比一个目标。