We ask the following question: what training information is required to design an effective outlier/out-of-distribution (OOD) detector, i.e., detecting samples that lie far away from the training distribution? Since unlabeled data is easily accessible for many applications, the most compelling approach is to develop detectors based on only unlabeled in-distribution data. However, we observe that most existing detectors based on unlabeled data perform poorly, often equivalent to a random prediction. In contrast, existing state-of-the-art OOD detectors achieve impressive performance but require access to fine-grained data labels for supervised training. We propose SSD, an outlier detector based on only unlabeled in-distribution data. We use self-supervised representation learning followed by a Mahalanobis distance based detection in the feature space. We demonstrate that SSD outperforms most existing detectors based on unlabeled data by a large margin. Additionally, SSD even achieves performance on par, and sometimes even better, with supervised training based detectors. Finally, we expand our detection framework with two key extensions. First, we formulate few-shot OOD detection, in which the detector has access to only one to five samples from each class of the targeted OOD dataset. Second, we extend our framework to incorporate training data labels, if available. We find that our novel detection framework based on SSD displays enhanced performance with these extensions, and achieves state-of-the-art performance. Our code is publicly available at https://github.com/inspire-group/SSD.
翻译:我们提出以下问题:设计一个有效的外向/外分发(OOD)探测器需要哪些培训信息,即探测远离培训分发的样本?由于许多应用程序容易获得未贴标签的数据,最令人信服的办法是,仅根据未贴标签的分配数据开发探测器;然而,我们发现,大多数基于未贴标签的数据的现有探测器运行不良,往往相当于随机预测。相比之下,现有的最新OOOD探测器取得令人印象深刻的性能,但需要为监管培训获取精细的重度数据标签。我们建议SD,即仅以未贴标签的分发数据为基础,而外部显示器。我们使用自我监督的代言语学习方法,随后在功能空间内进行基于未贴标签的检测。我们发现,基于未贴标签的数据的大多数现有探测器都比现有的大多数探测器差,往往等同于随机的预测。此外,SDSD甚至达到普通的性能,有时甚至更好,有监督的培训检测器。最后,我们用两个关键扩展了我们的检测框架。首先,我们制定几张有目标的 OOD检测小组,我们每个有目标的升级的SD检测框架,我们只有5个测试。