Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that might not hold in reality. In this paper, we propose a novel framework, Non-Parametric Outlier Synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. Importantly, our proposed synthesis approach does not make any distributional assumption on the ID embeddings, thereby offering strong flexibility and generality. We show that our synthesis approach can be mathematically interpreted as a rejection sampling framework. Extensive experiments show that NPOS can achieve superior OOD detection performance, outperforming the competitive rivals by a significant margin. Code is publicly available at https://github.com/deeplearning-wisc/npos.
翻译:对在野外安全部署机器学习模型而言,探测离散(OOD)是绝对必要的。关键挑战之一是,模型缺乏来自未知数据的监督信号,因此,能够对OOD数据作出过于自信的预测。最近关于外部合成的工作将特征空间建模为参数高斯分布模型,这是一个可能无法维持的强大和限制性假设,在本文中,我们提议了一个新颖的框架,即非光学外科学合成(NPOS),生成人工OOD培训数据,便利学习ID和OD数据之间的可靠决定界限。重要的是,我们提议的合成方法没有在ID嵌入数据上作任何分布性假设,从而提供了强大的灵活性和普遍性。我们表明,我们的合成方法可以数学地解释为拒绝抽样框架。广泛的实验表明,NPOS能够取得更高的OD检测性,在很大的幅度上优于竞争对手。代码可在https://github.com/deeplear-wisc/npos上公开查阅。</s>