Large-scale pre-trained Vision-Language Models (VLMs) have exhibited impressive zero-shot performance and transferability, allowing them to adapt to downstream tasks in a data-efficient manner. However, when only a few labeled samples are available, adapting VLMs to distinguish subtle differences between similar classes in specific downstream tasks remains challenging. In this work, we propose a Simple yet effective Negative Learning approach, SimNL, to more efficiently exploit the task-specific knowledge from few-shot labeled samples. Unlike previous methods that focus on identifying a set of representative positive features defining "what is a {CLASS}", SimNL discovers a complementary set of negative features that define "what is not a {CLASS}", providing additional insights that supplement the positive features to enhance task-specific recognition capability. Further, we identify that current adaptation approaches are particularly vulnerable to potential noise in the few-shot sample set. To mitigate this issue, we introduce a plug-and-play few-shot instance reweighting technique to suppress noisy outliers and amplify clean samples for more stable adaptation. Our extensive experimental results across 15 datasets validate that the proposed SimNL outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks while achieving competitive computational efficiency. Code is available at https://github.com/zhangce01/SimNL.
翻译:暂无翻译