Few-shot learning often involves metric learning-based classifiers, which predict the image label by comparing the distance between the extracted feature vector and class representations. However, applying global pooling in the backend of the feature extractor may not produce an embedding that correctly focuses on the class object. In this work, we propose a novel framework that generates class representations by extracting features from class-relevant regions of the images. Given only a few exemplary images with image-level labels, our framework first localizes the class objects by spatially decomposing the similarity between the images and their class prototypes. Then, enhanced class representations are achieved from the localization results. We also propose a loss function to enhance distinctions of the refined features. Our method outperforms the baseline few-shot model in miniImageNet and tieredImageNet benchmarks.
翻译:少见的学习往往涉及基于学习的标准化分类方法,这些分类方法通过比较提取的特性矢量和类别表示之间的距离来预测图像标签。然而,在特性提取器的后端应用全球集合,可能不会产生一个正确聚焦于类对象的嵌入功能。在这项工作中,我们提议了一个通过从与类别有关的图像区域提取特征来生成类表示的新框架。由于只有少数图像标签的示范性图像,我们的框架首先通过空间分解图像与类别原型的相似性,将类对象本地化。然后,从本地化结果中可以实现增强的类表示功能。我们还提议了一个损失功能,以加强精细特征的区分。我们的方法优于微型图像网络和分层图像网的基准中的基本微光模型。