Fine-grained object retrieval aims to learn discriminative representation to retrieve visually similar objects. However, existing top-performing works usually impose pairwise similarities on the semantic embedding spaces to continually fine-tune the entire model in limited-data regimes, thus resulting in easily converging to suboptimal solutions. In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompt and feature adaptation. Specifically, FRPT only needs to learn fewer parameters in the prompt and adaptation instead of fine-tuning the entire model, thus solving the convergence to suboptimal solutions caused by fine-tuning the entire model. Technically, as sample prompts, a structure perturbation prompt (SPP) is introduced to zoom and even exaggerate some pixels contributing to category prediction via a content-aware inhomogeneous sampling operation. In this way, SPP can make the fine-grained retrieval task aided by the perturbation prompts close to the solved task during the original pre-training. Besides, a category-specific awareness head is proposed and regarded as feature adaptation, which removes the species discrepancies in the features extracted by the pre-trained model using instance normalization, and thus makes the optimized features only include the discrepancies among subcategories. Extensive experiments demonstrate that our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.
翻译:精细的天体检索旨在从抽样的快速和特征适应角度学习歧视性的表达方式,但现有的顶级工作通常会给语义嵌入空间带来双向相似之处,从而在有限数据系统中不断微调整个模型,从而使整个模型在有限数据系统中不断微调非最佳解决方案。在本文中,我们开发了精度检索快速图象(FRPT),该模型从抽样的快速和特征适应角度指导一个冷冻的训练前模型,以便执行精度检索任务。具体地说,FRPT只需要在迅速和调整整个模型中学习较少的参数,而不是微调整个模型中微调参数,从而解决整个模型在微调中产生的与非最佳解决方案的趋同性解决方案的趋同性。从技术上,随着样本的加速,一个结构渗透性提示(SPPP)被引入了缩放,甚至夸大一些像素,有助于通过内容敏度的随机采样操作进行分类预测。在这方面,SPPPT可以让通过精度的精度的学习来进行精确的检索,从而理解整个模型的精确地分析,从而在原始的深度分析中实现深度变异差。