This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. Most previous models try to learn a fixed one-directional mapping between visual and semantic space, while some recently proposed generative methods try to generate image features for unseen classes so that the zero-shot learning problem becomes a traditional fully-supervised classification problem. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual-> semantic mapping, semantic->visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embeddings as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-the-art models in in classifying images from unseen classes.
翻译:本文研究通用零光学习问题, 要求模型从某些可见的班级对图像标签配对进行培训, 并测试从可见和不可见的班级对新图像进行分类的任务。 多数以前的模型试图在视觉空间和语义空间之间学习固定的单向绘图, 而最近提出的一些基因化方法试图为看不见班级生成图像特征, 以便零光学习问题成为传统的完全监督的分类问题 。 在本文中, 我们提出一个新的模型, 为三种不同方法提供一个统一的框架: 视觉 - > 语义绘图、 语义 - > 视觉绘图和 指标学习 。 具体地说, 我们提议的模型包括一个功能生成器, 能够产生给类嵌入的各种视觉特征, 将每个视觉特征映射回到相应的班级嵌入, 以及一个导师, 学会评估图像特征和班级嵌入的近距离。 所有三个组成部分都是在循环一致性损失和双重对抗性损失的组合下接受培训的。 实验结果显示, 我们的模型不仅在所见的班级中保持了对图像进行分类的更精确的精确性, 并且还比现有图像的状态演得更好。