More recently, Bidirectional Encoder Representations from Transformers (BERT) was proposed and has achieved impressive success on many natural language processing (NLP) tasks such as question answering and language understanding, due mainly to its effective pre-training then fine-tuning paradigm as well as strong local contextual modeling ability. In view of the above, this paper presents a novel instantiation of the BERT-based contextualized language models (LMs) for use in reranking of N-best hypotheses produced by automatic speech recognition (ASR). To this end, we frame N-best hypothesis reranking with BERT as a prediction problem, which aims to predict the oracle hypothesis that has the lowest word error rate (WER) given the N-best hypotheses (denoted by PBERT). In particular, we also explore to capitalize on task-specific global topic information in an unsupervised manner to assist PBERT in N-best hypothesis reranking (denoted by TPBERT). Extensive experiments conducted on the AMI benchmark corpus demonstrate the effectiveness and feasibility of our methods in comparison to the conventional autoregressive models like the recurrent neural network (RNN) and a recently proposed method that employed BERT to compute pseudo-log-likelihood (PLL) scores for N-best hypothesis reranking.
翻译:最近,提出了来自变换器的双向编码器代表(BERT),并在许多自然语言处理(NLP)任务上取得了令人印象深刻的成功,如问答和语言理解等,这主要是因为其有效培训前前培训,然后进行微调范式,以及强大的当地背景建模能力。鉴于以上情况,本文件提出了基于BERT的背景化语言模型(LMS)的新颖的即兴即时,用于自动语音识别(ASR)生成的最佳假设的重新排级。为此,我们把与BERT重新排位的最佳假设设定为预测问题,目的是预测由于N最佳假设(PBERT所注意到的)而出现的最低字差率(WER)以及强大的当地背景建模能力。特别是,我们还探索以不受监管的方式利用基于特定任务的全球专题信息,以协助PBERT在最佳假设重新排位(TPERT所注意到的)中进行最佳的重新排位(TPBERTET),在AMI基准中进行的广泛实验,展示了我们的方法在与最近采用的常规自动递制式模式(即惯性最佳模式)相比的有效性和可行性。