Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations. Existing methods find that, by aligning image representations with corresponding semantic labels, the semantic-aligned representations can be transferred to unseen categories. However, supervised by only seen category labels, the learned semantic knowledge is highly task-specific, which makes image representations biased towards seen categories. In this paper, we propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns task-specific and task-independent knowledge via semantic alignment and instance discrimination. First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity. Besides task-specific knowledge, DCEN then introduces task-independent knowledge by attracting representations of different views of the same image and repelling representations of different images. Compared to high-level seen category supervision, this instance discrimination supervision encourages DCEN to capture low-level visual knowledge, which is less biased toward seen categories and alleviates the representation bias. Consequently, the task-specific and task-independent knowledge jointly make for transferable representations of DCEN, which obtains averaged 4.1% improvement on four public benchmarks.
翻译:现有方法发现,通过将图像表示方式与相应的语义标签统一起来,语义一致的表示方式可以转移到看不见的类别;然而,在仅见的类别标签的监督下,所学的语义知识具有高度的任务性,使图像表示方式偏向于不同类别;在本文件中,我们提议建立一个新型的双轨嵌入网络,通过语义调整和实例歧视,同时学习特定任务和任务独立的知识;首先,环境网利用任务标签,通过跨式对比学习和探索语义-视觉互补性,将同一语义类的表示方式分组进行分类;除特定任务知识外,环境网随后通过吸引对同一图像的不同观点的表述和对不同图像的重新表述,引入任务性知识;与高级别观察类别监督相比,这种歧视监督鼓励环境网获取低层次的视觉知识,这种知识对所见类别没有多少偏向,并减轻了同一语义类的语义分类的标签标签,探索了语义-视觉互补性;因此,除了特定任务知识之外,环境网还引入了任务性知识,从而获得可转让性基准。