终端至终端语音识别残余能源模式 (Residual Energy-Based Models for End-to-End Speech Recognition)

End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR). These models formulate the sequence-level probability as a product of the conditional probabilities of all individual tokens given their histories. However, the performance of locally normalised models can be sub-optimal because of factors such as exposure bias. Consequently, the model distribution differs from the underlying data distribution. In this paper, the residual energy-based model (R-EBM) is proposed to complement the auto-regressive ASR model to close the gap between the two distributions. Meanwhile, R-EBMs can also be regarded as utterance-level confidence estimators, which may benefit many downstream tasks. Experiments on a 100hr LibriSpeech dataset show that R-EBMs can reduce the word error rates (WERs) by 8.2%/6.7% while improving areas under precision-recall curves of confidence scores by 12.6%/28.4% on test-clean/test-other sets. Furthermore, on a state-of-the-art model using self-supervised learning (wav2vec 2.0), R-EBMs still significantly improves both the WER and confidence estimation performance.

翻译：带有自动递进调解调器的端到端模型显示了自动语音识别的令人印象深刻的结果。这些模型将序列级概率作为所有个人象征具有历史的有条件概率的产物来制定序列级概率。但是,由于暴露偏差等因素,本地标准化模型的性能可能不理想。因此,模型分布与基本数据分布不同。本文提议以剩余能源为基础的模型(R-EBM)作为自动递进式ASR模型的补充,以缩小两个分布之间的距离。同时,R-EBMS也可以被视为全方位信任度估测器,这可能会有益于许多下游任务。 LibriSpeech 数据集的实验表明,R-EBMS可以将字差率降低8.2%/6.7%,同时在测试-清洁/测试各组中将精确召回的可信度曲线提高12.6%/28.4%。此外,在使用自我校准的性能评估(2.0)和WERMS(2.Veve)的状态模型上,还可以使用自超式的学习(2.0-BS)。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/