We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection. We examine several setups, based on the availability of labels or image captions and using different combinations of in- and out-distributions. Intriguingly, we find that (i) contrastive language-image pretrained models achieve state-of-the-art unsupervised out-of-distribution performance using nearest neighbors feature similarity as the OOD detection score, (ii) supervised state-of-the-art OOD detection performance can be obtained without in-distribution fine-tuning, (iii) even top-performing billion-scale vision transformers trained with natural language supervision fail at detecting adversarially manipulated OOD images. Finally, we argue whether new benchmarks for visual anomaly detection are needed based on our experiments. Using the largest publicly available vision transformer, we achieve state-of-the-art performance across all $18$ reported OOD benchmarks, including an AUROC of 87.6\% (9.2\% gain, unsupervised) and 97.4\% (1.2\% gain, supervised) for the challenging task of CIFAR100 $\rightarrow$ CIFAR10 OOD detection. The code will be open-sourced.
翻译:我们根据标签或图像说明的可用性,并使用不同组合的在分配和在分配过程中和在分配之外使用不同的组合,对预先训练的特征提取器进行了全面实验研究;我们根据现有的标签或图像说明,并使用不同的在分配过程中和在分配过程中使用的不同组合,对一些设置进行了检查;令人感兴趣的是,我们发现(一) 对比式的、预先训练的语文图像模型达到最先进的、不受监督的在分配过程中使用最先进的在分配过程中采用OOOD检测分数的最接近的近邻的性能;(二) 监督的、最先进的OOOOD检测分数的性能,可以在不进行分配微调的情况下获得监督的状态性能;(三) 即使是经过天然语言监督的10亿级顶级视觉变型变型变型器,在探测敌对操纵的OOODD图像时都失败;最后,我们争论是否有必要根据我们的实验制定新的视觉异常性检测基准;我们利用最大的公开的变型变型器,在所报告的全部18美元OD基准中实现最先进的性性性性性性性能表现,包括AURAR100美元的开放式探测任务。</s>