In realistic open-set scenarios where labels of a part of testing data are totally unknown, current prompt methods on vision-language (VL) models always predict the unknown classes as the downstream training classes. The exhibited label bias causes difficulty in the open set recognition (OSR), by which an image should be correctly predicted as one of the known classes or the unknown one. To learn prompts in open-set scenarios, we propose the Regularized prompt Tuning (R-Tuning) to mitigate the label bias. It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more. Thus, prompts are tuned in a simulated open-set scenario. Besides, inspired by the observation that classifying directly on large datasets causes a much higher false positive rate than on small datasets, we propose the Combinatorial Tuning and Testing (CTT) strategy for improving performance. CTT decomposes R-Tuning on large datasets as multiple independent group-wise tuning on fewer classes, then makes comprehensive predictions by selecting the optimal sub-prompt. For fair comparisons, we construct new baselines for OSR based on VL models, especially for prompt methods. Our method achieves the best results on datasets with various scales. Extensive ablation studies validate the effectiveness of our method.
翻译:在现实的开放设定假设中,测试数据的一部分标签完全不为人知,当前关于视觉语言(VL)模型的快速方法总是将未知的类别作为下游培训课进行预测。展示的标签偏见在公开设定识别(OSR)中造成困难,因为根据公开设定的识别(OSR),应当正确预测一个图像作为已知类别之一或未知类别之一。为了在开放设定假设中学习提示,我们建议采用常规化快速调试(R-Turning)战略来减少标签偏差。它从WordNet引入开放单词,将生成快速文本的单词范围从仅封闭设置的标签单词扩大到更多。因此,在模拟的开放设定假设情景中调整提示。此外,根据对大数据集进行直接分类导致错误率比小数据集高得多。我们建议了组合式调试(CTTT)战略来提高性。CTT将R-Turning放在大型数据集上,作为对较少类的多个独立分组调,然后通过选择最佳子设置的分类来做出全面预测。此外,在模拟的开放设定的开放式设定假设情景情景假设中调整。为了进行公平的对比,我们用最精确的模型,我们的数据基准,我们用各种方法来建立各种方法来进行新的校准。</s>