Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances has a critical effect in practical use. Recent works have shown that using extra data and labels can improve the OOD detection performance, yet it could be costly to collect such data. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection. Our method designs a novel domain-regularized module (DRM) to reduce the overconfident phenomenon of a vanilla classifier, achieving a better generalization in both cases. Besides, DRM can be used as a drop-in replacement for the last layer in any neural network-based intent classifier, providing a low-cost strategy for a significant improvement. The evaluation on four datasets shows that our method built on BERT and RoBERTa models achieves state-of-the-art performance against existing approaches and the strong baselines we created for the comparisons.
翻译:内在分类是口语理解(SLU)中的一项主要任务。 由于大多数模型是用预先收集的内地培训语句(IND)建立起来的,因此,它们检测无支持外外外地语(OOOD)的能力在实际使用方面有着重要影响。最近的工作表明,使用额外数据和标签可以提高OOOD检测性能,但收集这些数据成本很高。本文件提议只用IND数据来培训模型,同时支持IND意图分类和OOOD检测。我们的方法设计了一个新的域常规化模块(DRM),以减少香草分类的过度信任现象,在两种情况下都实现更好的概括化。此外,DRM可以用作任何以神经网络为基础的意图分类器最后一层的滴入替换,为重大改进提供低成本战略。对四个数据集的评估表明,我们基于BERT和ROBERTA模型的方法比现有的方法和我们为比较创建的强有力的基线都取得了最先进的业绩。