Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance.
翻译:GPT-2、BERT和RBERTA等大型语言模型(LLMs)已成功地应用于ASR N最佳比对,然而,它们能否或如何使竞争受益,接近最先进的ASR系统仍未探索。在本研究中,我们将LLM的比对纳入最具竞争力的ASR基线之一:Confer-Transer模型。我们证明LLM的双向性、预培训、内部微调和背景增强取得了一致的改进。此外,我们的词汇分析还揭示了这些组成部分中每个组成部分如何为ASR的绩效做出贡献。