Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for recognizing new classes. Many methods extend upon this solution, and recent ones are especially keen on extracting rich features from images, e.g. attribute features. These attribute features are normally extracted within each individual image; however, the common traits for features across images yet belonging to the same attribute are not emphasized. In this paper, we propose a new framework to boost ZSL by explicitly learning attribute prototypes beyond images and contrastively optimizing them with attribute-level features within images. Besides the novel architecture, two elements are highlighted for attribute representations: a new prototype generation module is designed to generate attribute prototypes from attribute semantics; a hard example-based contrastive optimization scheme is introduced to reinforce attribute-level features in the embedding space. We explore two alternative backbones, CNN-based and transformer-based, to build our framework and conduct experiments on three standard benchmarks, CUB, SUN, AwA2. Results on these benchmarks demonstrate that our method improves the state of the art by a considerable margin. Our codes will be available at https://github.com/dyabel/CoAR-ZSL.git
翻译:零点学习( ZSL) 旨在识别在培训组中没有样本的班级。 一种有代表性的解决方案是直接学习将视觉特征与相应的类语义结合的嵌入功能, 将视觉特征与相应的类语义联系起来, 以识别新类。 许多方法延伸到这个解决方案, 最近的方法特别热衷于从图像中提取丰富的特征, 例如属性特征。 这些属性特征通常是在每个图像中提取的; 然而, 不同图像中属于同一属性的特征的共同特征没有被强调。 在本文中, 我们提出一个新的框架, 以通过明确学习图像以外的属性原型, 并以图像中的属性级别特征作对比优化来提升 ZSL。 除了新颖的结构外, 有两个元素被突出用于属性表达: 一个新的原型生成模块旨在生成属性语义属性原型的属性原型; 引入一个基于硬示例的对比优化计划, 以加强嵌入空间的属性级特征。 我们探索两个替代的支柱, 以CNN 和变压器为基础, 以构建我们的框架, 并在三个标准基准上进行实验, CUB、 SUN2 有关这些基准的结果将显示我们的方法改进了我们的艺术/ ALBelb 。