Self-supervised methods (SSL) have achieved significant success via maximizing the mutual information between two augmented views, where cropping is a popular augmentation technique. Cropped regions are widely used to construct positive pairs, while the left regions after cropping have rarely been explored in existing methods, although they together constitute the same image instance and both contribute to the description of the category. In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL). Specifically, given two different images, we randomly crop a region (called the paste view) from each image with the same size and swap them to compose two new images together with the left regions (called the canvas view), respectively. Then, contrastive pairs can be efficiently constructed according to the following simple criteria, i.e., each view is (1) positive with views augmented from the same original image and (2) negative with views augmented from other images. With minor modifications to popular SSL methods, RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views, therefore learning better visual representations. Experiments on ImageNet, MS COCO, and Cityscapes demonstrate that RegionCL improves MoCo v2, DenseCL, and SimSiam by large margins and achieves state-of-the-art performance on classification, detection, and segmentation tasks. The code will be available at https://github.com/Annbless/RegionCL.git.
翻译:自我监督方法(SSL)通过在两种扩大观点(即作物种植是一种流行的增强技术)之间最大限度地相互提供信息,取得了显著的成功。作物种植区域被广泛用于构建正对,而作物种植后的左区域则很少以现有方法分别用于构建正对,而作物种植后的左区域则很少以现有方法加以探索,尽管它们共同构成相同的图像实例,而且两者都有助于描述这一类别。在本文件中,我们第一次尝试从完整的角度展示两个区域在作物种植中的重要性,并提议一个简单而有效的借口任务,即区域竞争学习(RegionCL)。具体地说,鉴于两种不同的图像,我们随机从每个图像中绘制一个区域(称为糊面视图),并分别将每个图像从相同的大小中绘制两个区域(即粉刷区域),并把它们转换成两个新图像与左边区域(即画布视图)一起制作。然后,对比配对的配对可以按照以下简单标准(即:每套图像的原始图像从相同的原始图像中放大了观点,2 负面的图像从其他图像中增加了观点。在流行的SLSLSLL方法中,区域将利用这些丰富的配方,因此帮助模型在图像和图像中将使得图像在图像搜索区域、图像搜索中更好地区分了这些图像和图像的图像和图像的图像的图像的图像中,从而在图像中将改进了空间-CLO-ROCLO-RO-RO-RO-RO-CLO-SBSBSBSB2和过去和过去和过去的图像的图像显示的图像的图像的图像显示的图像的图像的图像的图像中,从而改进了。