Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture. Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. During pretraining, DETReg predicts object localizations to match the localizations from an unsupervised region proposal generator and simultaneously aligns the corresponding feature embeddings with embeddings from a self-supervised image encoder. We implement DETReg using the DETR family of detectors and show that it improves over competitive baselines when finetuned on COCO, PASCAL VOC, and Airbus Ship benchmarks. In low-data regimes, including semi-supervised and few-shot learning settings, DETReg establishes many state-of-the-art results, e.g., on COCO we see a +6.0 AP improvement for 10-shot detection and +3.5 AP improvement when training with only 1\% of the labels. For code and pretrained models, visit the project page at https://amirbar.net/detreg
翻译:最近自我监督的物体探测训练前方法主要侧重于对物体探测器的骨干进行预先训练,忽视探测结构的关键部分。相反,我们引入了DETReg,这是一种在包括物体定位和嵌入部件在内的整个物体探测网络之前对立的新的自我监督方法。在培训前,DETReg预测物体定位与未经监督的区域建议生成器的定位相匹配,同时将相应的特征嵌入与自我监督图像编码器嵌入的嵌入相匹配。我们使用DETR的探测器系列来实施DETReg,并表明在微调COCO、PASCAL VOC和空中客车船舶基准时,它比竞争基线有所改进。在低数据系统中,包括半监督和几发学习环境中,DETReg建立了许多状态,例如,在COCOO上,我们看到了10分光探测的+6.0 AP改进,在仅用标签1巴的培训时, AP的改进是+3.5 AP。关于代码和预培训模型,访问 http://as/aremir的项目页面。