Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
翻译:少样例学习是一个具有挑战性的问题,因为只提供了少量的示例来识别新类别。最近的几项研究利用额外的语义信息,例如类别名称的文本嵌入,通过将语义原型与视觉原型组合来解决少样本问题。然而,这些方法仍然会受到来自罕见支持样本的虚假视觉特征的影响,导致收益有限。在本文中,我们提出了一种新的少样本学习语义提示(SP)方法。我们不是简单地利用语义信息来解决分类器问题,而是探索将语义信息作为提示来自适应地调节视觉特征提取网络。具体来说,我们设计了两种互补机制来将语义提示插入到特征提取器中:一种是通过自我关注沿空间维度使语义提示与补丁嵌入相互作用,另一种是通过沿通道维度补充可视化特征与转换后的语义提示。通过结合这两种机制,特征提取器具有更好的能力来关注特定于类别的特征,并且只需少量的支持样本就可以获得更广义的图像表示。通过在四个数据集上的广泛实验,所提出的方法取得了有希望的结果,平均提高了1-shot学习准确性3.67%。