In a regular open set detection problem, samples of known classes (also called closed set classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and (2) also detect samples that do not belong to any of the known classes (we say they belong to some unknown or open set classes). This paper studies the problem of zero-shot open-set detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel and yet simple method (called ZO-CLIP) to solve the problem. ZO-CLIP builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained multi-modal model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate some candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot open set detection. Experimental results on 5 benchmark datasets for open set detection confirm that ZO-CLIP outperforms the baselines by a large margin.
翻译:在常规开放的检测问题中,已知类别(也称为封闭型分类)的样本被用于培训一个特殊的分类员。在测试中,分类员可以(1) 将已知类别测试样品分类到各自的类别,(2) 检测不属于任何已知类别(我们说它们属于某些未知或开放型分类)的样本。本文研究的是零发开集检测的问题,在测试中,它仍然执行同样的两项任务,但除了使用已知类别名称外,没有受过任何培训。本文件提出一种创新的、但又简单的方法(称为ZO-CLIP)来解决这一问题。ZO-CLLIP在通过多模式代表学习实现零发分类方面的最新进展的基础上发展。它首先通过在CLIP顶端培训一个基于文本的图像描述生成器来扩展预先培训多模式。在测试中,它使用扩展模型来为每个测试样本生成一些候选的未知类名称,并根据已知的类别名称和候选人的未知类名称进行信任度分数(称为ZO-CLIP),根据已知的类别名称来进行零发式开放式检测。通过5个基准基点的基点数据,测试结果。