Out-of-distribution (OOD) detection is indispensable for machine learning models deployed in the open world. Recently, the use of an auxiliary outlier dataset during training (also known as outlier exposure) has shown promising performance. As the sample space for potential OOD data can be prohibitively large, sampling informative outliers is essential. In this work, we propose a novel posterior sampling-based outlier mining framework, POEM, which facilitates efficient use of outlier data and promotes learning a compact decision boundary between ID and OOD data for improved detection. We show that POEM establishes state-of-the-art performance on common benchmarks. Compared to the current best method that uses a greedy sampling strategy, POEM improves the relative performance by 42.0% and 24.2% (FPR95) on CIFAR-10 and CIFAR-100, respectively. We further provide theoretical insights on the effectiveness of POEM for OOD detection.
翻译:对于在开放世界部署的机器学习模型来说,探测离散(OOD)是不可或缺的。最近,在培训期间使用辅助外源数据集(也称为外部暴露)显示出了有希望的业绩。由于潜在的OOD数据的样本空间可能非常大,因此抽样信息外源至关重要。在这项工作中,我们提出了一个新的后端取样外源采矿框架,即POEM,它有助于有效利用外部数据,并促进学习ID和OOD数据之间的紧凑决定界限,以改进探测。我们表明,POEM在共同基准方面建立了最先进的性能。与目前采用贪婪取样战略的最佳方法相比,POEM将相对性能分别提高42.0%和24.2%(FPR95,关于CIFAR-10和CIFAR-100,我们进一步从理论上深入了解PEM在OD探测方面的有效性。