Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. However, these approaches make a strong distributional assumption that the auxiliary outlier data is completely separable from the in-distribution (ID) data. In this paper, we propose a novel framework that leverages wild mixture data, which naturally consists of both ID and OOD samples. Such wild data is abundant and arises freely upon deploying a machine learning classifier in their natural habitats. Our key idea is to formulate a constrained optimization problem and to show how to tractably solve it. Our learning objective maximizes the OOD detection rate, subject to constraints on the classification error of ID data and on the OOD error rate of ID examples. We extensively evaluate our approach on common OOD detection tasks and demonstrate superior performance.
翻译:在野外部署的机器学习模型中,探测离散(OOD)很重要。最近的方法使用辅助外源数据使改进 OOD探测模型正规化。然而,这些方法提供了强有力的分布式假设,即辅助外源数据完全可以与分布(ID)数据分离。在本文中,我们提出了一个利用野生混合数据的新框架,这些数据自然由ID和OOD样本组成。这种野生数据是丰富的,在在其自然栖息地部署机器学习分类器时自由产生。我们的主要想法是提出一个有限的优化问题,并展示如何可以顺利解决。我们的学习目标是最大限度地提高OOOD探测率,但受ID数据分类错误和ID示例OOD误差率的限制。我们广泛评价我们关于共同 OOD探测任务的方法,并展示高超性能。