We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves $49.4$AP in $12$ epochs and $51.3$AP in $24$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of $\textbf{+6.0}$\textbf{AP} and $\textbf{+2.7}$\textbf{AP}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO \texttt{val2017} ($\textbf{63.2}$\textbf{AP}) and \texttt{test-dev} (\textbf{$\textbf{63.3}$AP}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at \url{https://github.com/IDEACVR/DINO}.
翻译:我们在本文中展示 DINO (\ textbf{D}ETR 和\ textf* 初始化混合查询选择方法, 并展望两次框预测方案 。 DINO 在有 ResNet- 50 主干和多级功能的 CO 上以 $\ textbf{O} 和 $\ textbf\\\\\\ 7} 与 DN- DETR 相比, 定位初始化混合查询选择方法, 并展望两次框预测方案 。 DINO 达到 49.4$$$ $ $ 2, 以 $ unch@ textb{N} 和 $ $ $ 21.3$ epops, 在 ResNet- 50 主干和多级天线探测器上, 将大大改进 $\ textb{b} 。 与 DN- flickr 相比, 在 格式 和 数据大小上, DETR 的 DNO 标度将大大降低 NO 标度。