Slot-filling and intent detection are the backbone of conversational agents such as voice assistants, and are active areas of research. Even though state-of-the-art techniques on publicly available benchmarks show impressive performance, their ability to generalize to realistic scenarios is yet to be demonstrated. In this work, we present NATURE, a set of simple spoken-language oriented transformations, applied to the evaluation set of datasets, to introduce human spoken language variations while preserving the semantics of an utterance. We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly. Through our experiments we demonstrate that when NATURE operators are applied to evaluation set of popular benchmarks the model accuracy can drop by up to 40%.
翻译:信箱填充和意图探测是语音助理等谈话代理人的骨干,也是活跃的研究领域。尽管在公开的基准上最先进的技术表现出令人印象深刻的业绩,但是它们是否有能力概括现实的情景尚有待证明。在这项工作中,我们展示了一套简单的面向语言的变异,适用于数据集的评价组,以引入人的语言变异,同时保留语义的语义。我们将天体图应用于通用的空档填充和意图探测基准,并表明从NATURE的标准评估中简单的扰动可以显著地恶化模型性能。我们通过实验证明,当用天体图操作者来评价流行基准时,模型精确度可以下降40%。