Zero-shot learning (ZSL) aims at recognizing classes for which no visual sample is available at training time. To address this issue, one can rely on a semantic description of each class. A typical ZSL model learns a mapping between the visual samples of seen classes and the corresponding semantic descriptions, in order to do the same on unseen classes at test time. State of the art approaches rely on generative models that synthesize visual features from the prototype of a class, such that a classifier can then be learned in a supervised manner. However, these approaches are usually biased towards seen classes whose visual instances are the only one that can be matched to a given class prototype. We propose a regularization method that can be applied to any conditional generative-based ZSL method, by leveraging only the semantic class prototypes. It learns to synthesize discriminative features for possible semantic description that are not available at training time, that is the unseen ones. The approach is evaluated for ZSL and GZSL on four datasets commonly used in the literature, either in inductive and transductive settings, with results on-par or above state of the art approaches.
翻译:零点学习( ZSL) 旨在识别培训时间没有视觉样本的班级。 解决这个问题, 人们可以依赖每个班级的语义描述。 一个典型的 ZSL 模型可以对可见班级的视觉样本和相应的语义描述进行绘图, 以便在测试时间对看不见的班级进行同样的测绘。 艺术状态方法依赖于综合某一班级原型的视觉特征的基因化模型, 这样就可以以监督的方式学习分类器。 然而, 这些方法通常偏向于那些视觉实例是唯一能够与特定班级原型匹配的视觉实例的被观察班级。 我们建议一种正规化方法, 它可以适用于任何有条件的基于基因的ZSLF方法, 仅利用语义类原型。 它学会合成歧视性特征, 用于可能无法在培训时间提供的语义描述, 即是看不见的。 在文献中常用的四种数据集, 无论是在感应和感知环境中,, 都有艺术状态上或上的结果 。