Self-supervised learning (SSL) strategies have demonstrated remarkable performance in various recognition tasks. However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. Intuitively, common rationales tend to correspond to the discriminative patterns from the key parts of foreground objects. We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective without using any pre-trained object parts or saliency detectors, making it seamlessly to be integrated with the existing SSL process. Specifically, we fit the GradCAM with a branch with limited fitting capacity, which allows the branch to capture the common rationales and discard the less common discriminative patterns. At the test stage, the branch generates a set of spatial weights to selectively aggregate features representing an instance. Extensive experimental results on four visual tasks demonstrate that the proposed method can lead to a significant improvement in different evaluation settings.
翻译:自我监督的学习(SSL)战略在各种表彰任务中表现出了显著的成绩,然而,我们的初步调查和最近的研究表明,这些战略在学习精细视觉识别(FGVR)的显示方法方面可能不太有效,因为许多有助于优化SSL目标的特征不适合确定FGVR的细微差异。为了克服这一问题,我们提议学习一个额外的筛选机制,以查明在各种实例和类别中常见的、被称为本文件共同理由的区别性线索。直觉地说,共同的理由往往与地表物体关键部分的歧视模式相对应。我们表明,只要利用从SSL目标中生成的格拉德坎姆系统,而不使用任何预先训练过的物体部件或显要的探测器,就可以学到共同的理由探测器。具体地说,我们把格拉德坎姆系统配在一个部门,其能力有限,使该处能够捕捉到共同的理由,并抛弃较不常见的区别模式。在试验阶段,该分支可以形成一套空间重量的通用原理探测器,用以在有选择性的总体评价中展示一套拟议的有选择的综合特征。</s>