自我监督学习中的物体感知裁剪 (Object-Aware Cropping for Self-Supervised Learning)

A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised loss. The underlying assumption is that randomly cropped and resized regions of a given image share information about the objects of interest, which the learned representation will capture. This assumption is mostly satisfied in datasets such as ImageNet where there is a large, centered object, which is highly likely to be present in random crops of the full image. However, in other datasets such as OpenImages or COCO, which are more representative of real world uncurated data, there are typically multiple small objects in an image. In this work, we show that self-supervised learning based on the usual random cropping performs poorly on such datasets. We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm. This encourages the model to learn both object and scene level semantic representations. Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks. For example, on OpenImages, our approach achieves an improvement of 8.8% mAP over random scene-level cropping using MoCo-v2 based pre-training. We also show significant improvements on COCO and PASCAL-VOC object detection and segmentation tasks over the state-of-the-art self-supervised learning approaches. Our approach is efficient, simple and general, and can be used in most existing contrastive and non-contrastive self-supervised learning frameworks.

翻译：自我监督学习的核心组成部分是数据增强。其中，裁剪是选择图像子区域用作自我监督损失中的正例视图。其基本假设是，随机裁剪和调整大小的给定图像的子区域共享有关感兴趣物体的信息，而学习到的表征将捕获这些信息。但是，这个假设在OpenImages和COCO等更真实的未筛选数据集中往往不成立，因为这些数据集中通常存在多个小物体。在本文中，我们展示了通常的随机裁剪的自我监督学习在这些数据集上表现不佳。我们提出用物体多边形算法获得的裁剪替换一些或全部随机裁剪。这能够鼓励模型学习物体和场景级别的语义表征。使用我们的方法（称为物体感知裁剪），在分类和物体检测基准上，将产生比场景裁剪更显著的改进。例如，在OpenImages上，我们的方法在以MoCo-v2为基础的预训练中比随机场景级别裁剪提高了8.8％的mAP。我们还通过自监督学习方法在COCO和PASCAL-VOC目标检测和分割任务上取得了显著的改进。我们的方法高效、简单、通用，可用于大多数现有的对比和非对比自我监督学习框架。

相关内容

监督学习

关注 131

监督学习是指：利用一组已知类别的样本调整分类器的参数，使其达到所要求性能的过程，也称为监督训练或有教师学习。监督学习是从标记的训练数据来推断一个功能的机器学习任务。训练数据包括一套训练示例。在监督学习中，每个实例都是由一个输入对象（通常为矢量）和一个期望的输出值（也称为监督信号）组成。监督学习算法是分析该训练数据，并产生一个推断的功能，其可以用于映射出新的实例。一个最佳的方案将允许该算法来正确地决定那些看不见的实例的类标签。这就要求学习算法是在一种“合理”的方式从一种从训练数据到看不见的情况下形成。

【WWW2022】图上的聚类感知的监督对比学习，ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs

专知会员服务

18+阅读 · 2022年3月28日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

专知会员服务

32+阅读 · 2020年5月14日