We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ~ 3% on some metrics to whopping 20% on a few).
翻译:我们考虑了零光识别问题:学习一个具有零培训实例的类别视觉分类器,只是使用该类别嵌入的词及其与其他类别的关系,提供视觉数据。处理不熟悉或新颖类别的关键是转让从熟悉类别获得的知识,描述不熟悉类别。在本文中,我们以最近推出的图集集网络为基础,提出一种既使用语义嵌入法又使用直线关系来预测分类器的方法。根据一个学习的知识图表(KG),我们的方法将每个节点(代表视觉类别)的语义嵌入作为输入语义嵌入。在一系列图解后,我们预测每个类别视觉分类器。在培训期间,少数类别的视觉分类器被授予学习GCN参数。在测试时,这些过滤器被用来预测看不见类别视觉分类器的视觉分类器。我们显示,我们的方法对KG的噪音是强大的。更重要的是,我们的方法比目前的最新结果(从某些指标的2%到3%)的20 %的性能大大改进。