Fine-grained visual classification (FGVC) is challenging but more critical than traditional classification tasks. It requires distinguishing different subcategories with the inherently subtle intra-class object variations. Previous works focus on enhancing the feature representation ability using multiple granularities and discriminative regions based on the attention strategy or bounding boxes. However, these methods highly rely on deep neural networks which lack interpretability. We propose an Interpretable Attention Guided Network (IAGN) for fine-grained visual classification. The contributions of our method include: i) an attention guided framework which can guide the network to extract discriminitive regions in an interpretable way; ii) a progressive training mechanism obtained to distill knowledge stage by stage to fuse features of various granularities; iii) the first interpretable FGVC method with a competitive performance on several standard FGVC benchmark datasets.
翻译:精细的视觉分类(FGVC)具有挑战性,但比传统分类任务更为关键,它要求区分不同亚类,并区分本类内物体固有的微妙变异。以前的工作重点是利用关注策略或捆绑框,利用多种颗粒和歧视性区域,提高特征代表能力。但是,这些方法高度依赖缺乏解释性的深层神经网络。我们建议为精细的视觉分类建立一个可解释关注引导网络(IGN)。我们的方法的贡献包括:(一) 关注引导框架,该框架可以指导网络以可解释的方式提取共聚区域;(二) 逐步建立培训机制,通过阶段提炼知识阶段,将各种颗粒的特性结合起来;(三) 第一个可解释的FGVC方法,在几个标准的FGVC基准数据集上具有竞争性性。