Modern neural networks can assign high confidence to inputs drawn from outside the training distribution, posing threats to models in real-world deployments. While much research attention has been placed on designing new out-of-distribution (OOD) detection methods, the precise definition of OOD is often left in vagueness and falls short of the desired notion of OOD in reality. In this paper, we present a new formalization and model the data shifts by taking into account both the invariant and environmental (spurious) features. Under such formalization, we systematically investigate how spurious correlation in the training set impacts OOD detection. Our results suggest that the detection performance is severely worsened when the correlation between spurious features and labels is increased in the training set. We further show insights on detection methods that are more effective in reducing the impact of spurious correlation and provide theoretical analysis on why reliance on environmental features leads to high OOD detection error. Our work aims to facilitate a better understanding of OOD samples and their formalization, as well as the exploration of methods that enhance OOD detection.
翻译:现代神经网络可以高度信任从培训分布之外获得的投入,对现实世界部署中的模型构成威胁。虽然许多研究注意力都放在设计新的分配外检测方法上,但对OOOD的确切定义往往含糊不清,在现实中没有达到OOD的理想概念。在本文件中,我们提出一个新的正规化和数据变化模型,同时考虑到不变化和环境(净化)特点。在这种正规化中,我们系统地调查培训设置对OOOD探测的影响是如何虚假的关联。我们的研究结果表明,如果在培训组合中增加假的特征和标签之间的关联,探测性能就会严重恶化。我们进一步展示有关探测方法的洞见,这些方法对于减少虚假相关性的影响更有效,并提供理论分析,说明为什么依赖环境特征会导致高OOD检测误差。我们的工作旨在促进更好地了解OD样品及其正规化,以及探索加强OOD探测的方法。