Fine-grained Visual Classification (FGVC) aims to identify objects from subcategories. It is a very challenging task because of the subtle inter-class differences. Existing research applies large-scale convolutional neural networks or visual transformers as the feature extractor, which is extremely computationally expensive. In fact, real-world scenarios of fine-grained recognition often require a more lightweight mobile network that can be utilized offline. However, the fundamental mobile network feature extraction capability is weaker than large-scale models. In this paper, based on the lightweight MobilenetV2, we propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI). First, we propose a Recursive Mosaic Generator (RMG) that generates images with different granularities in different phases. Then, the features of different stages pass through a Multi-Stage Interaction (MSI) module, which strengthens and complements the corresponding features of different stages. Finally, using the progressive training (P), the features extracted by the model in different stages can be fully utilized and fused with each other. Experiments on three prestigious fine-grained benchmarks show that RMG-PMSI can significantly improve the performance with good robustness and transferability.
翻译:精度视觉分类(FGVC)旨在辨别亚类对象,这是一项极具挑战性的任务,因为各等级之间的差别很微妙。现有的研究将大规模变异神经网络或视觉变压器用作地貌提取器,这是极昂贵的计算方法。事实上,细度识别的现实世界情景往往需要一个更轻的移动网络,可以使用离线式移动网络。然而,基本的移动网络特征提取能力比大型模型要弱。在本文中,基于轻量移动网络2,我们建议采用渐进式多阶段互动培训方法,配有再稳定型摩西发电机(RMG-PMSI)。首先,我们建议采用再稳定型摩西型发电机(RMG),在不同的阶段产生不同微粒的图像。然后,不同阶段的特征通过多层互动模块,加强和补充不同阶段的相应特征。最后,利用渐进式培训(P),不同阶段中提取的功能可以充分加以利用,并且以每个阶段的稳健性MIS基准显示稳健的磁性。