Unsupervised pretraining has recently proven beneficial for computer vision tasks, including object detection. However, previous self-supervised approaches are not designed to handle a key aspect of detection: localizing objects. Here, we present DETReg, an unsupervised pretraining approach for object DEtection with TRansformers using Region priors. Motivated by the two tasks underlying object detection: localization and categorization, we combine two complementary signals for self-supervision. For an object localization signal, we use pseudo ground truth object bounding boxes from an off-the-shelf unsupervised region proposal method, Selective Search, which does not require training data and can detect objects at a high recall rate and very low precision. The categorization signal comes from an object embedding loss that encourages invariant object representations, from which the object category can be inferred. We show how to combine these two signals to train the Deformable DETR detection architecture from large amounts of unlabeled data. DETReg improves the performance over competitive baselines and previous self-supervised methods on standard benchmarks like MS COCO and PASCAL VOC. DETReg also outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO. For code and pretrained models, visit the project page at https://amirbar.net/detreg
翻译:未受监督的训练前训练最近证明对计算机视觉任务(包括物体探测)有益处。然而,先前的自我监督办法的设计不是为了处理探测的一个关键方面:定位对象。在这里,我们提出DeTReg,这是使用区域前置方法与TRansfrents进行物体探测的未经监督的训练前训练方法。受对象探测的两个基本任务:本地化和分类,我们结合了自我监督的两种补充信号。对于目标定位信号,我们使用假的地面真相物体从现成的不受监督的区域建议方法(选择搜索)中捆绑框,该方法不需要培训数据,能够以高回收率和非常低的精确度探测物体。分类信号来自鼓励变异物体表示的嵌入对象,可以从中推断出物体类别。我们展示了如何将这两个信号结合起来,用大量未标数据来训练变异式的DETR检测结构。DTRegm 改进了竞争性基线和以前在标准基准基准(如MS CO 和 PASCL VTRO) 的自我监督性方法, 之前只用 URL 5 标准模型来进行测试。