Language understanding in speech-based systems have attracted much attention in recent years with the growing demand for voice interface applications. However, the robustness of natural language understanding (NLU) systems to errors introduced by automatic speech recognition (ASR) is under-examined. %To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics. Based on the proposed benchmark, we systematically investigate the effect of ASR error on NLU tasks in terms of noise intensity, error type and speaker variants. We further purpose two ways, correction-based method and data augmentation-based method to improve robustness of the NLU systems. Extensive experimental results and analysises show that the proposed methods are effective to some extent, but still far from human performance, demonstrating that NLU under ASR error is still very challenging and requires further research.
翻译:由于对语音界面应用程序的需求不断增加,近年来对语音系统中的语言理解问题引起了很大关注,然而,对自然语言理解系统对自动语音识别(ASR)引起的错误的稳健性研究不足。%为便利对ASR-robust通用语言理解的研究,我们在本文件中提议ASR-GLUE基准,这是一套新的6种不同的语言理解系统任务,用于评价ASR错误下3个不同级别背景噪音和6个具有不同声音特征的发言者的模型的性能。根据拟议基准,我们系统地调查ASR错误对NLU工作在噪音强度、错误类型和演讲变式方面的影响。我们进一步确定了两种方法,即基于更正的方法和基于数据增强的方法,以提高NLU系统的稳健性。广泛的实验结果和分析表明,拟议的方法在某种程度上是有效的,但远离人的性能,表明ASR错误下的NLU仍然非常具有挑战性,需要进一步研究。