The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark. At this scale, especially with many fine-grained categories included in ImageNet-21K, it is critical to learn quality visual semantic representations that are discriminative enough to recognize unseen classes and distinguish them from seen ones. We propose a \emph{H}ierarchical \emph{G}raphical knowledge \emph{R}epresentation framework for the confidence-based classification method, dubbed as HGR-Net. Our experimental results demonstrate that HGR-Net can grasp class inheritance relations by utilizing hierarchical conceptual knowledge. Our method significantly outperformed all existing techniques, boosting the performance by 7\% compared to the runner-up approach on the ImageNet-21K benchmark. We show that HGR-Net is learning-efficient in few-shot scenarios. We also analyzed our method on smaller datasets like ImageNet-21K-P, 2-hops and 3-hops, demonstrating its generalization ability. Our benchmark and code are available at https://kaiyi.me/p/hgrnet.html.
翻译:我们在本文件中处理的主要问题是,如何将隐蔽类的视觉识别(又称零光学习)提升到成象网21K基准中的数万个类别。在这个规模上,特别是图像网21K中包含的许多细细分类类别,学习高质量的视觉语义表达方式至关重要,这些表达方式具有歧视性,足以识别隐蔽类并将其与所见类区分开来。我们建议为基于信任的分类方法(称为HGR-Net)建立一个显示框架,称为“零光学习”。我们的实验结果显示,HGR-Net能够利用等级概念知识掌握阶级继承关系。我们的方法大大超越了所有现有技术,比图像网21K基准的脱步方法提高了7个百分点的性能。我们显示,HGR-Net在微小的情景下学习效率。我们还分析了我们关于图像网-21K-P、2-hops和3hops/hops等较小数据集的方法,展示了它的一般化能力。我们的代码和代码在 http://yima/s.