As fine-grained visual classification (FGVC) being developed for decades, great works related have exposed a key direction -- finding discriminative local regions and revealing subtle differences. However, unlike identifying visual contents within static images, for recognizing objects in the real physical world, discriminative information is not only present within seen local regions but also hides in other unseen perspectives. In other words, in addition to focusing on the distinguishable part from the whole, for efficient and accurate recognition, it is required to infer the key perspective with a few glances, e.g., people may recognize a "Benz AMG GT" with a glance of its front and then know that taking a look at its exhaust pipe can help to tell which year's model it is. In this paper, back to reality, we put forward the problem of active fine-grained recognition (AFGR) and complete this study in three steps: (i) a hierarchical, multi-view, fine-grained vehicle dataset is collected as the testbed, (ii) a simple experiment is designed to verify that different perspectives contribute differently for FGVC and different categories own different discriminative perspective, (iii) a policy-gradient-based framework is adopted to achieve efficient recognition with active view selection. Comprehensive experiments demonstrate that the proposed method delivers a better performance-efficient trade-off than previous FGVC methods and advanced neural networks.
翻译:作为长达数十年的精细视觉分类(FGVC),相关的伟大作品揭示了一个关键方向 -- -- 找到歧视性的地方地区和揭示微妙的差异。然而,与在静态图像中识别视觉内容不同,为了识别真实物理世界中的物体,歧视性信息不仅存在于当地区域内部,而且隐藏于其他看不见的视角中。换句话说,除了侧重于与整体的区别部分,为了高效和准确的识别,除了需要从几眼中推导出关键视角,例如,人们可能从前面一眼看“Benz AMG GT”,然后知道查看其排气管有助于显示哪一年的模型。在本文中,我们从现实的角度提出积极微调识别的问题,并分三个步骤完成这项研究:(一) 等级、多视角、精细微的车辆数据集作为测试床收集,(二) 简单实验的目的是核实不同视角对FGVC和不同的废气管,然后了解其排气管有助于揭示哪一年的模型。 在本文中,我们提出了积极的细化识别(三) 提出一个更好的分析方法,从而实现基于以前的高级贸易的先进观点。