The key challenge of zero-shot learning (ZSL) is how to infer the latent semantic knowledge between visual and attribute features on seen classes, and thus achieving a desirable knowledge transfer to unseen classes. Prior works either simply align the global features of an image with its associated class semantic vector or utilize unidirectional attention to learn the limited latent semantic representations, which could not effectively discover the intrinsic semantic knowledge e.g., attribute semantics) between visual and attribute features. To solve the above dilemma, we propose a Mutually Semantic Distillation Network (MSDN), which progressively distills the intrinsic semantic representations between visual and attribute features for ZSL. MSDN incorporates an attribute$\rightarrow$visual attention sub-net that learns attribute-based visual features, and a visual$\rightarrow$attribute attention sub-net that learns visual-based attribute features. By further introducing a semantic distillation loss, the two mutual attention sub-nets are capable of learning collaboratively and teaching each other throughout the training process. The proposed MSDN yields significant improvements over the strong baselines, leading to new state-of-the-art performances on three popular challenging benchmarks, i.e., CUB, SUN, and AWA2. Our codes have been available at: \url{https://github.com/shiming-chen/MSDN}.
翻译:零光学习( ZSL) 的关键挑战是如何在视觉和属性特性之间推断出视觉和属性特性之间的潜在语义学知识,从而实现向看不见的类别转移所需的知识。 先前的工作要么只是将图像的全球特征与其相关的类语义矢量相匹配,要么是利用单向关注来学习有限的潜在语义表达方式,这些表达方式无法有效地发现视觉和属性特性之间的内在语义学知识,例如属性语义学。 为了解决上述困境,我们提议建立一个相互语义蒸馏网络(MSDN),逐步将ZSL的视觉和属性特性之间的内在语义表达方式进行蒸馏。 MSDN 包含一个属性$rightrow$视觉关注子网,学习基于属性的视觉特征,以及一个视觉和属性等内在的语义表达方式。 通过进一步引入语义淡化损失,两个相互关注子网络能够在整个培训过程中相互学习并教授对方。 拟议的MSDN 和 SAW 3 SUB 的高级性标码具有挑战性, SHUB 3 和 SUB 的高级基准。 SHUB 具有挑战性。 SUB 3 。 SUB 和 SUD 的高级基准。
Microsoft Developer Network 微软开发者网络
http:///http://msdn.microsoft.com