In this work, we propose a novel methodology for self-supervised learning for generating global and local attention-aware visual features. Our approach is based on training a model to differentiate between specific image transformations of an input sample and the patched images. Utilizing this approach, the proposed method is able to outperform the previous best competitor by 1.03% on the Tiny-ImageNet dataset and by 2.32% on the STL-10 dataset. Furthermore, our approach outperforms the fully-supervised learning method on the STL-10 dataset. Experimental results and visualizations show the capability of successfully learning global and local attention-aware visual representations.
翻译:在这项工作中,我们提出了一种自监督学习的新方法,用于生成全球和地方关注视觉特征。我们的方法基于培训一种模型,以区分输入样本和补齐图像的具体图像转换。利用这一方法,拟议方法能够在Tiny-ImaageNet数据集和STL-10数据集上优于先前的最佳竞争对手1.03%。此外,我们的方法超过了STL-10数据集上全监督的学习方法。实验结果和可视化显示成功学习全球和地方关注视觉表现的能力。