Unsupervised representation learning achieves promising performances in pre-training representations for object detectors. However, previous approaches are mainly designed for image-level classification, leading to suboptimal detection performance. To bridge the performance gap, this work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID), which can be treated as a contrastive pretext task to learn location-discriminative representation unsupervisedly, possessing appealing advantages compared to its counterparts. Firstly, unlike fully-supervised person Re-ID that matches a human identity in different camera views, patch Re-ID treats an important patch as a pseudo identity and contrastively learns its correspondence in two different image views, where the pseudo identity has different translations and transformations, enabling to learn discriminative features for object detection. Secondly, patch Re-ID is performed in Deeply Unsupervised manner to learn multi-level representations, appealing to object detection. Thirdly, extensive experiments show that our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages. For example, Mask R-CNN initialized with our representation surpasses MoCo v2 and even its fully-supervised counterparts in all setups of training iterations (e.g. 2.1 and 1.1 mAP improvement compared to MoCo v2 in 12k and 90k iterations respectively). Code will be released at https://github.com/dingjiansw101/DUPR.
翻译:未经监督的代表学习在物体探测器的培训前演示中取得了有希望的成绩。然而,以往的方法主要设计为图像级分类,导致检测性业绩低于最佳水平。为缩小性能差距,这项工作提出了一种简单而有效的物体探测代言学习方法,名为补丁重新识别(Re-ID),可被视为一种反比的借口任务,以学习地点差异性代表,不受监督,拥有与其对应方相比的吸引优势。首先,不同于在不同镜头视图中匹配人类身份的完全监督的人Re-ID,补丁 Re-ID将一个重要补丁作为假身份,并用两种不同的图像视图学习其通信,假身份有不同的翻译和变换,从而能够学习物体探测的区别性特征。第二,补丁再识别可被视为一种非常不受监督的借口,学习多级代表,吸引对象探测。第三,广泛的实验表明,我们的方法大大超越了在所有环境中的COCO2的对应方,例如不同的培训和数据百分数。例如,将M-N-RO-RO-RO-RO-RO-RO-A 分别用于S-RO-C AS AS AS IMAR ASAR 和O-SU-I-I-I-I-I-SAR-SAR-SAR-SAR-IAR ASTIOL-SAR 校 校 校 校 校 校 校 校 校 校 校 校 校 校 校 校 校 。