无人监督的对分配外探测评估:以数据为中心的视角</s> (Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective)

Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD. However, in the real world, we do not always have such ground truths, and thus do not know which sample is correctly detected and cannot compute the metric like AUROC to evaluate the performance of different OOD detection methods. In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection, which aims to evaluate OOD detection methods in real-world changing environments without OOD labels. We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance. We further introduce a new benchmark Gbench, which has 200 real-world OOD datasets of various label spaces to train and evaluate our method. Through experiments, we find a strong quantitative correlation betwwen Gscore and the OOD detection performance. Extensive experiments demonstrate that our Gscore achieves state-of-the-art performance. Gscore also generalizes well with different IND/OOD datasets, OOD detection methods, backbones and dataset sizes. We further provide interesting analyses of the effects of backbones and IND/OOD datasets on OOD detection performance. The data and code will be available.

翻译：分解(OOD)检测方法假定它们具有测试地面真实性,即,单个测试样品是分布式(IND)还是OOD。然而,在现实世界中,我们并不总是有这样的地面真实性,因此我们不知道哪些样本是正确检测的,因此不能像AUROC那样计算测量指标来评价不同OOD检测方法的性能。在本文中,我们首先在OOD检测中引入了不受监督的评估问题,目的是在没有OOOD标签的实际情况变化环境中评价OOOD检测方法。我们提出了三种计算Gscore作为OD检测性能不受监督的指标的方法。我们进一步引入了一个新的基准Gbench,该基准有200个各种标签空间的真实的OODD数据集,用于培训和评估我们的方法。我们通过实验发现一个很强的定量相关性Betwen Gscore 和OD检测性能。广泛的实验表明,我们的Gsco将达到最新性能。Gsco还把Gscrequenation Gros 与不同的INND/OD 数据基数分析。</s>

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日