We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, a DETR variant DINO~\cite{zhang2022dino}, and an efficient DETR training method Group DETR~\cite{chen2022group}. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves $\textbf{64.5}$ mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco
翻译:我们展示了一个强大的物体探测器,配有编码器-编码器预培训和微调。我们的方法称为“DETR v2”组,其基础是视觉变压器编码器ViT-Huge ⁇ cite{dosovitskiy202020image}、DETR变量DINO ⁇ cite{zhang2022dino}以及高效的DETR培训方法组DETR ⁇ cite{chen2022group}。培训过程包括:在图像Net-1K上进行自我监督的预先训练和微调ViT-Huge编码器,在目标365上对探测器进行预先训练,最后在COCO上进行微调 DETR v2 实现 $\ textbf{64.5} CO 测试dev,并在COCO领导板https://papers withcodecode.com/sota/ota/object-dection-on-cocooc 上建立了一个新的 SoTAAP 。