The challenge of fine-grained visual recognition often lies in discovering the key discriminative regions. While such regions can be automatically identified from a large-scale labeled dataset, a similar method might become less effective when only a few annotations are available. In low data regimes, a network often struggles to choose the correct regions for recognition and tends to overfit spurious correlated patterns from the training data. To tackle this issue, this paper proposes the self-boosting attention mechanism, a novel method for regularizing the network to focus on the key regions shared across samples and classes. Specifically, the proposed method first generates an attention map for each training image, highlighting the discriminative part for identifying the ground-truth object category. Then the generated attention maps are used as pseudo-annotations. The network is enforced to fit them as an auxiliary task. We call this approach the self-boosting attention mechanism (SAM). We also develop a variant by using SAM to create multiple attention maps to pool convolutional maps in a style of bilinear pooling, dubbed SAM-Bilinear. Through extensive experimental studies, we show that both methods can significantly improve fine-grained visual recognition performance on low data regimes and can be incorporated into existing network architectures. The source code is publicly available at: https://github.com/GANPerf/SAM
翻译:细微视觉识别的挑战往往在于发现关键的歧视性区域。 虽然从大规模标记的数据集中可以自动识别出这类区域, 但类似方法在只有少量说明的情况下可能会变得不那么有效。 在低数据系统中, 网络往往在努力选择正确的识别区域, 并且往往将培训数据中的虚假关联模式过分匹配。 为了解决这个问题, 本文提议了自我促进关注机制, 这是一种使网络正规化的新方法, 以关注不同样本和类别之间共享的关键区域为重点。 具体地说, 拟议方法首先为每个培训图像绘制关注地图, 突出识别地面真相对象类别的歧视部分。 然后, 生成的注意地图被用作假说明。 网络被强制将它们作为辅助任务。 我们称之为自我促进关注机制( SAM ) 。 我们还开发了一种变体, 利用SAM 创建多重关注地图, 以双线集合方式将卷在一起, 调制成 SAM- Bilinear 。 通过广泛的实验研究, 我们展示了用于确定地面目标对象对象对象分类的区分部分。 我们展示了两种方法可以显著改进现有数据源码/ 。