Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (e.g., either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 13.1% (AUROC). Code is available at https://github.com/deeplearning-wisc/MCM.
翻译:承认分配(OOD)样本对于在开放世界中部署的机器学习系统至关重要,绝大多数OOD检测方法都是由单一模式(例如视觉或语言)驱动的,使多种模式的丰富信息得不到利用,受最近视力语言培训前的成功启发,本文件丰富了OOD检测的景观,从单一模式到多模式制度,特别是,我们提议了最大概念匹配(MCM),这是一种简单而有效的零发OOOD检测方法,其基础是使视觉特征与文字概念相一致。我们为深入分析和理论洞察作出了贡献,以了解MCM的有效性。 广泛的实验表明,MCM在广泛的现实世界任务中取得了优异的成绩。 具有视觉特征的MCM超越了共同基线,在硬的OD任务中,精细的视觉特征为13.1%(AUROC)。 代码见https://github.com/deepeplearning-wisc/MCM)。