Self-supervised models trained with a contrastive loss such as CLIP have shown to be very powerful in zero-shot classification settings. However, to be used as a zero-shot classifier these models require the user to provide new captions over a fixed set of labels at test time. In many settings, it is hard or impossible to know if a new query caption is compatible with the source captions used to train the model. We address these limitations by framing the zero-shot classification task as an outlier detection problem and develop a conformal prediction procedure to assess when a given test caption may be reliably used. On a real-world medical example, we show that our proposed conformal procedure improves the reliability of CLIP-style models in the zero-shot classification setting, and we provide an empirical analysis of the factors that may affect its performance.
翻译:自我监督的模型,如CLIP 等,经过有对比性损失培训的模型在零发分级设置中显示非常强大。 但是,要作为零发分级器使用这些模型,这些模型要求用户在测试时提供一套固定标签的新标题。 在许多环境中,很难或不可能知道一个新的查询标题是否与用于培训模型的源标题相容。 我们通过将零发分级任务设计成一个出局检测问题来解决这些问题,并开发一个符合要求的预测程序来评估何时可靠地使用给定的测试标题。 在现实医学实例中,我们显示,我们提议的符合程序提高了零发分级设置中CLIP型模型的可靠性,并且我们对可能影响其性能的因素进行了实验性分析。