Pretrained transformers achieve remarkable performance when the test data follows the same distribution as the training data. However, in real-world NLU tasks, the model often faces out-of-distribution (OoD) instances. Such instances can cause the severe semantic shift problem to inference, hence they are supposed to be identified and rejected by the model. In this paper, we study the OoD detection problem for pretrained transformers using only in-distribution data in training. We observe that such instances can be found using the Mahalanobis distance in the penultimate layer. We further propose a contrastive loss that improves the compactness of representations, such that OoD instances can be better differentiated from in-distribution ones. Experiments on the GLUE benchmark demonstrate the effectiveness of the proposed methods.
翻译:未经培训的变压器在测试数据采用与培训数据相同的分布时取得显著的性能。然而,在实际的NLU任务中,模型常常面临分配外(OoD)情况。这类情况可能导致严重的语义转换问题被推论,因此应该被模型识别和拒绝。在本文中,我们只使用培训中的分配数据来研究未受过培训的变压器的OOD检测问题。我们观察到,在倒数第二层的Mahalanobis距离中可以发现这种情况。我们进一步提议了一种对比性损失,可以改善表述的紧凑性,从而使OOD情况与分配中的变压器相比可以有更好的区别。GLUE基准实验显示了拟议方法的有效性。