Goal oriented dialogue systems, that interact in real-word environments, often encounter noisy data. In this work, we investigate how robust goal oriented dialogue systems are to noisy data. Specifically, our analysis considers intent classification (IC) and slot labeling (SL) models that form the basis of most dialogue systems. We collect a test-suite for six common phenomena found in live human-to-bot conversations (abbreviations, casing, misspellings, morphological variants, paraphrases, and synonyms) and show that these phenomena can degrade the IC/SL performance of state-of-the-art BERT based models. Through the use of synthetic data augmentation, we are improve IC/SL model's robustness to real-world noise by +11.5 for IC and +17.3 points for SL on average across noise types. We make our suite of noisy test data public to enable further research into the robustness of dialog systems.
翻译:以目标为导向的对话系统,在现实环境中互动,往往会遇到吵闹的数据。 在这项工作中,我们调查了以目标为导向的对话系统对吵闹的数据有多强健。 具体地说,我们的分析考虑了构成大多数对话系统基础的意图分类(IC)和位置标签(SL)模式。 我们收集了用于在人对人对人活对话中发现的六种常见现象的测试用具(活活生生生的、外壳、弹壳、错译、形态变异、副词和同义词),并表明这些现象可以降低IC/SL基于最新BERT模型的性能。我们通过使用合成数据增强,正在将IC/SM模型对现实世界噪音的强健性通过+11.5来改进IC/SL模型,并将SL平均在噪音类型中的17.3点加到+17.3点。 我们用热度测试数据组公开,以便能够进一步研究对话系统的强健性。