Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples. Undoubtedly, this task inherits the main challenges from both few-shot learning and fine-grained recognition. First, the lack of labeled samples makes the learned model easy to overfit. Second, it also suffers from high intra-class variance and low inter-class difference in the datasets. To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric. Specifically, the BAS is introduced to generate a foreground mask for localization to weaken background disturbance and enhance dominative foreground objects. What's more, considering the lack of labeled samples, we compute the pairwise similarity of feature maps using both the raw image and the refined image. The FOA then reconstructs the feature map of each support sample according to its correction to the query ones, which addresses the problem of misalignment between support-query image pairs. To enable the proposed method to have the ability to capture subtle differences in confused samples, we present a novel L2L similarity metric to further measure the local similarity between a pair of aligned spatial features in the embedding space. Extensive experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
翻译:微微微微的识别(FS-FGR)旨在在有限的现有样本的帮助下,识别新型微微微的类别。毫无疑问,这一任务继承了来自微小学习和微微认知的主要特点。首先,标签样本的缺乏使得学习的模型容易被过度使用。其次,它也存在高等级内部差异和低等级之间在数据集中的差异。为了应对这一具有挑战性的任务,我们提议了一个两阶段背景抑制和前方范围调整框架,它由背景激活模块(BAS)、前方对象校正模块(FOA)和本地至本地(L2L)的相似度指标组成。具体地,BAS是用来生成地面掩码,以降低背景扰动和增强定位的地面物体。更有甚者,考虑到标签样本的缺乏,我们用原始图像和精细的图像来比较地貌图的相似性相似性。FOA随后重建每种支持样本的地貌图样图,以其细微的比值校准度比值为基础, 使当前图像的精确度能够使我们模拟的模型的精确度得以测量。