Recent research in self-supervised learning (SSL) has shown its capability in learning useful semantic representations from images for classification tasks. Through our work, we study the usefulness of SSL for Fine-Grained Visual Categorization (FGVC). FGVC aims to distinguish objects of visually similar sub categories within a general category. The small inter-class, but large intra-class variations within the dataset makes it a challenging task. The limited availability of annotated labels for such a fine-grained data encourages the need for SSL, where additional supervision can boost learning without the cost of extra annotations. Our baseline achieves $86.36\%$ top-1 classification accuracy on CUB-200-2011 dataset by utilizing random crop augmentation during training and center crop augmentation during testing. In this work, we explore the usefulness of various pretext tasks, specifically, rotation, pretext invariant representation learning (PIRL), and deconstruction and construction learning (DCL) for FGVC. Rotation as an auxiliary task promotes the model to learn global features, and diverts it from focusing on the subtle details. PIRL that uses jigsaw patches attempts to focus on discriminative local regions, but struggles to accurately localize them. DCL helps in learning local discriminating features and outperforms the baseline by achieving $87.41\%$ top-1 accuracy. The deconstruction learning forces the model to focus on local object parts, while reconstruction learning helps in learning the correlation between the parts. We perform extensive experiments to reason our findings. Our code is available at https://github.com/mmaaz60/ssl_for_fgvc.
翻译:自我监督学习(SSL)的近期研究显示,它有能力从分类任务的图像中学习有用的语义表达式(SSL),通过我们的工作,我们研究了SSL用于精美显性视觉分类(FGVC)的有用性。FGVC的目的是在一般类别中区分视觉上相似子类别的物体。在数据集中,小类间但大型的类内变异使得它成为一项具有挑战性的任务。这种细细细的数据的附加标签数量有限,这鼓励了SSL的需要,因为额外的监督可以在不增加说明费用的情况下促进学习。我们的基线在CUB-200-2011(CUB-200-2011)数据中实现了86.36-美元最高-1的分类准确性。在测试中,我们利用随机的作物增量和中心作物增量来区分相近的子类别。在这个工作中,我们探索各种托辞任务的效用,特别是轮换、变数代表学习(PIRL),以及FGVC的开源和建筑学习(DCL)的模型(DL),作为辅助任务促进我们学习全球特性的模型,并转移它关注微妙的细节。PIRL,在地区里学习。