From the beginning of zero-shot learning research, visual attributes have been shown to play an important role. In order to better transfer attribute-based knowledge from known to unknown classes, we argue that an image representation with integrated attribute localization ability would be beneficial for zero-shot learning. To this end, we propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features using only class-level attributes. While a visual-semantic embedding layer learns global features, local features are learned through an attribute prototype network that simultaneously regresses and decorrelates attributes from intermediate features. We show that our locality augmented image representations achieve a new state-of-the-art on three zero-shot learning benchmarks. As an additional benefit, our model points to the visual evidence of the attributes in an image, e.g. for the CUB dataset, confirming the improved attribute localization ability of our image representation.
翻译:从零光学习研究开始,视觉特征就被证明具有重要作用。为了更好地将已知的基于属性的知识从已知的类别转移到未知的类别,我们认为,一个具有综合属性本地化能力的图像显示将有利于零光学习。为此,我们提出一个新的零光代表学习框架,仅使用等级属性,共同学习具有歧视性的全球和地方特征。虽然视觉成像嵌入层学习全球特征,但通过一个具有属性的原型网络学习地方特征,该原型网络同时从中间特征中回归和变异特性。我们显示,我们地点扩大的图像显示在三个零光学习基准上取得了新的艺术状态。作为额外的好处,我们的模型指出一个图像中属性的视觉证据,例如CUB数据集,证实了我们图像体现的特性的增强本地化能力。