Few-Shot Recognition (FSR) tackles classification tasks by training with minimal task-specific labeled data. Prevailing methods adapt or finetune a pretrained Vision-Language Model (VLM) and augment the scarce training data by retrieving task-relevant but noisy samples from open data sources. The finetuned VLM generalizes decently well to the task-specific in-distribution (ID) test data but struggles with out-of-distribution (OOD) test data. This motivates our study of robust FSR with VLM finetuning. The core challenge of FSR is data scarcity, extending beyond limited training data to a complete lack of validation data. We identify a key paradox as a potential solution: repurposing the retrieved open data for validation. As such retrieved data are inherently OOD compared with the task-specific ID training data, finetuned VLMs yield degraded performance on the retrieved data. This causes the validation logic to favor the pretrained model without any finetuning, hindering improvements w.r.t generalization. To resolve this dilemma, we introduce a novel validation strategy that harmonizes performance gain and degradation on the few-shot ID data and the retrieved data, respectively. Our validation enables parameter selection for partial finetuning and checkpoint selection, mitigating overfitting and improving test-data generalization. We unify this strategy with robust learning into a cohesive framework: Validation-Enabled Stage-wise Tuning (VEST). Extensive experiments on the established ImageNet OOD benchmarks show that VEST significantly outperforms existing VLM adaptation methods, achieving state-of-the-art FSR performance on both ID and OOD data.
翻译:小样本识别(FSR)通过使用极少量任务特定标注数据进行训练来处理分类任务。主流方法通过调整或微调预训练的视觉-语言模型(VLM),并从开放数据源中检索任务相关但含有噪声的样本来扩充稀缺的训练数据。微调后的VLM在任务特定的分布内(ID)测试数据上表现良好,但在分布外(OOD)测试数据上表现不佳。这促使我们研究基于VLM微调的鲁棒FSR。FSR的核心挑战是数据稀缺性,不仅限于有限的训练数据,还包括完全缺乏验证数据。我们发现一个关键悖论可能成为解决方案:将检索到的开放数据重新用于验证。由于这些检索数据相对于任务特定的ID训练数据本质上是OOD的,微调后的VLM在检索数据上性能下降,导致验证逻辑倾向于未经微调的预训练模型,从而阻碍了泛化能力的提升。为解决这一困境,我们提出了一种新颖的验证策略,该策略分别在小样本ID数据和检索数据上协调性能增益与下降。我们的验证机制支持部分微调参数选择和检查点选择,减轻过拟合并提升测试数据泛化能力。我们将该策略与鲁棒学习统一为一个连贯框架:验证启用的分阶段调优(VEST)。在已建立的ImageNet OOD基准测试上的大量实验表明,VEST显著优于现有VLM适应方法,在ID和OOD数据上均实现了最先进的小样本识别性能。