Recently, two-stage Deformable DETR introduced the query-based two-stage head, a new type of two-stage head different from the region-based two-stage heads of classical detectors as Faster R-CNN. In query-based two-stage heads, the second stage selects one feature per detection processed by a transformer, called the query, as opposed to pooling a rectangular grid of features processed by CNNs as in region-based detectors. In this work, we improve the query-based head by improving the prior of the cross-attention operation with anchors, significantly speeding up the convergence while increasing its performance. Additionally, we empirically show that by improving the cross-attention prior, auxiliary losses and iterative bounding box mechanisms typically used by DETR-based detectors are no longer needed. By combining the best of both the classical and the DETR-based detectors, our FQDet head peaks at 45.4 AP on the 2017 COCO validation set when using a ResNet-50+TPN backbone, only after training for 12 epochs using the 1x schedule. We outperform other high-performing two-stage heads such as e.g. Cascade R-CNN, while using the same backbone and while being computationally cheaper. Additionally, when using the large ResNeXt-101-DCN+TPN backbone and multi-scale testing, our FQDet head achieves 52.9 AP on the 2017 COCO test-dev set after only 12 epochs of training. Code is released at https://github.com/CedricPicron/FQDet .
翻译:最近,两阶段变形的DETR引入了基于查询的两阶段头,这是一种新型的两阶段头,不同于基于区域的经典探测器两阶段头,即快速R-CNN。在基于查询的两阶段头中,第二阶段选择了由变压器处理的每个探测器的一个特征,称为查询,而不是将CNN处理的功能的矩形网与区域探测器的功能网合并起来。在这项工作中,我们改进了基于查询的两阶段头,即改进了带有锚的交叉注意操作前前端,大大加快了趋同速度,同时提高了其性能。此外,我们的经验显示,通过改进基于DETR的探测器通常使用的交叉注意前端、辅助性损失和迭代捆绑框机制,已不再需要。通过将传统探测器和基于DETR的探测器的最佳组合,我们的FQD头峰值在201717年使用ResNet-50+TPNFCO验证标准时达到45.4的峰值,而只是在使用1 CNx计划为12个选区进行培训之后,我们在高端-CN-N-ND考试后,在使用高端标准级测试时,在高端标准进行。