Different from the general visual classification, some classification tasks are more challenging as they need the professional categories of the images. In the paper, we call them expert-level classification. Previous fine-grained vision classification (FGVC) has made many efforts on some of its specific sub-tasks. However, they are difficult to expand to the general cases which rely on the comprehensive analysis of part-global correlation and the hierarchical features interaction. In this paper, we propose Expert Network (ExpNet) to address the unique challenges of expert-level classification through a unified network. In ExpNet, we hierarchically decouple the part and context features and individually process them using a novel attentive mechanism, called Gaze-Shift. In each stage, Gaze-Shift produces a focal-part feature for the subsequent abstraction and memorizes a context-related embedding. Then we fuse the final focal embedding with all memorized context-related embedding to make the prediction. Such an architecture realizes the dual-track processing of partial and global information and hierarchical feature interactions. We conduct the experiments over three representative expert-level classification tasks: FGVC, disease classification, and artwork attributes classification. In these experiments, superior performance of our ExpNet is observed comparing to the state-of-the-arts in a wide range of fields, indicating the effectiveness and generalization of our ExpNet. The code will be made publicly available.
翻译:与一般视觉分类不同,有些分类任务更具有挑战性,因为它们需要图像的专业类别。在本文中,我们称之为专家级分类。以前的精细视觉分类(FGVC)已经就某些特定的子任务做出了许多努力。然而,它们很难扩大到依赖对部分全球相关性和等级特征互动的全面分析的一般案例。在本文件中,我们建议专家网络(ExpNet)通过统一的网络应对专家级分类的独特挑战。在ExpNet中,我们分等级地分解部分和背景特征,并使用新的关注机制(称为Gaze-Shift)单独处理这些特征。在每一个阶段,Gaze-Shift都为随后的抽象和与背景关联的回忆制作了一个焦点部分特征。随后,我们将最终的焦点与所有与记忆相关的背景嵌入结合起来,以作出预测。这种架构将实现部分和全球信息和分级特征互动的双轨处理。我们在三个有代表性的专家级分类任务上进行了实验:FGVC、广度网络的功能分类和艺术分类中,将我们观察到的高级实验领域中的高级实验范围。