The purpose of few-shot recognition is to recognize novel categories with a limited number of labeled examples in each class. To encourage learning from a supplementary view, recent approaches have introduced auxiliary semantic modalities into effective metric-learning frameworks that aim to learn a feature similarity between training samples (support set) and test samples (query set). However, these approaches only augment the representations of samples with available semantics while ignoring the query set, which loses the potential for the improvement and may lead to a shift between the modalities combination and the pure-visual representation. In this paper, we devise an attributes-guided attention module (AGAM) to utilize human-annotated attributes and learn more discriminative features. This plug-and-play module enables visual contents and corresponding attributes to collectively focus on important channels and regions for the support set. And the feature selection is also achieved for query set with only visual information while the attributes are not available. Therefore, representations from both sets are improved in a fine-grained manner. Moreover, an attention alignment mechanism is proposed to distill knowledge from the guidance of attributes to the pure-visual branch for samples without attributes. Extensive experiments and analysis show that our proposed module can significantly improve simple metric-based approaches to achieve state-of-the-art performance on different datasets and settings.
翻译:短片识别的目的是承认每个类别中带有数量有限的标签实例的新类别; 为了鼓励从补充观点中学习,最近采用的方法将辅助语义模式引入有效的示范学习框架,目的是学习培训样品(支助组)和测试样品(查询组)之间的特征相似性; 然而,这些方法只增加样品与现有语义学的表述,而忽略了查询组,这丧失了改进潜力,可能导致模式组合与纯视觉代表制之间的转变; 在本文件中,我们设计了一个属性引导关注模块(AGAM),以利用人类附加说明的属性并学习更具有歧视性的特征。这个插头和播放模块使视觉内容和相应的属性能够共同侧重于支持组的重要渠道和区域。对于仅具有视觉信息的查询组,也实现了特征选择,而没有属性。因此,两组的表述都以细微分辨的方式得到改进。此外,我们建议了一个关注调整机制,以从属性指南中提取知识,用于没有属性的纯视觉部门。这个插图模块使视觉内容和相应的属性能够显著改进简单测试和分析,从而实现我们拟议的标准性模型的状态。