Most of the existing Zero-Shot Learning (ZSL) methods focus on learning a compatibility function between the image representation and class attributes. Few others concentrate on learning image representation combining local and global features. However, the existing approaches still fail to address the bias issue towards the seen classes. In this paper, we propose implicit and explicit attention mechanisms to address the existing bias problem in ZSL models. We formulate the implicit attention mechanism with a self-supervised image angle rotation task, which focuses on specific image features aiding to solve the task. The explicit attention mechanism is composed with the consideration of a multi-headed self-attention mechanism via Vision Transformer model, which learns to map image features to semantic space during the training stage. We conduct comprehensive experiments on three popular benchmarks: AWA2, CUB and SUN. The performance of our proposed attention mechanisms has proved its effectiveness, and has achieved the state-of-the-art harmonic mean on all the three datasets.
翻译:现有的零热学习方法大多侧重于学习图像代表与阶级属性之间的兼容功能; 很少有其他方法侧重于学习将地方和全球特征结合起来的图像代表; 然而,现有方法仍然未能解决对所看到类别存在的偏向问题; 在本文件中,我们提出了解决零热学习模式中现有偏向问题的隐含和明确关注机制; 我们以自我监督的图像角度旋转任务来制定隐含的关注机制,侧重于帮助解决任务的具体图像特征; 明确的关注机制包括考虑通过视野变异模型来建立多头型的自留机制,该模型在培训阶段将图像特征映射到语义空间; 我们在三种受欢迎的基准:AWAC2、CUB和SUN上进行全面实验。 我们拟议的关注机制的绩效证明了其有效性,并实现了所有三个数据集的最先进的调和度值。