Detecting out-of-distribution (OOD) instances is significant for the safe deployment of NLP models. Among recent textual OOD detection works based on pretrained language models (PLMs), distance-based methods have shown superior performance. However, they estimate sample distance scores in the last-layer CLS embedding space and thus do not make full use of linguistic information underlying in PLMs. To address the issue, we propose to boost OOD detection by deriving more holistic sentence embeddings. On the basis of the observations that token averaging and layer combination contribute to improving OOD detection, we propose a simple embedding approach named Avg-Avg, which averages all token representations from each intermediate layer as the sentence embedding and significantly surpasses the state-of-the-art on a comprehensive suite of benchmarks by a 9.33% FAR95 margin. Furthermore, our analysis demonstrates that it indeed helps preserve general linguistic knowledge in fine-tuned PLMs and substantially benefits detecting background shifts. The simple yet effective embedding method can be applied to fine-tuned PLMs with negligible extra costs, providing a free gain in OOD detection. Our code is available at https://github.com/lancopku/Avg-Avg.
翻译:为了安全部署非LLP模型,检测传播外(OOD)的事例非常重要,在最近根据预先培训的语言模型(PLM)进行的文本 OOD检测工程中,远程方法显示出了优异的性能,然而,它们估计了最后一级CLS嵌入空间的抽样距离分数,因此没有充分利用PLM所蕴藏的语言信息。为了解决这个问题,我们建议通过更全面地嵌入句子来增强OOD检测。根据观测结果,象征性平均和分层结合有助于改进OOD检测,我们提议采用名为Avg-Avg的简单嵌入方法,将每个中间层的所有象征性表示平均作为句子嵌入并大大超过以9.33%的FAR95差幅计算的综合基准组合的状态。此外,我们的分析表明,它确实有助于保存微调的PLMS和基本利益,检测背景变化。简单有效的嵌入方法可以用于微调 PLMS,其额外费用可忽略不小,为ODD/Avcom检测提供免费收益。我们的代码可在 https://Acopg/Ag/Ang/Angivs。