在以任务为导向的对话框中对语言理解能力进行强力测试 (Robustness Testing of Language Understanding in Task-Oriented Dialog)

Most language understanding models in task-oriented dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable output when being exposed to natural language perturbation or variation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural language perturbations for testing the robustness issues in task-oriented dialog. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in task-oriented dialog.

翻译：任务导向对话系统的大多数语言理解模式都是在少量附加说明的培训数据基础上培训的,并且从同一分布的一小部分中加以评价。然而,这些模式在遇到自然语言干扰或实际做法上的差异时,可能导致系统失灵或产出不理想。在本文件中,我们对自然语言理解模式的稳健性进行了全面评估和分析,并介绍了现实世界对话系统中与语言理解有关的三个重要方面,即语言多样性、语言特征和噪音扰动。我们提议了一个模型-认知工具包LAUG,以近似自然语言扰动,用于测试任务导向对话中的稳健性问题。在LAUG中汇集了涵盖三个方面的四个数据增强方法,揭示了最新模型中的关键稳健性问题。通过LAUG增加的数据集可用于促进今后对任务导向对话中语言理解的稳健性测试的研究。