Although DETR-based 3D detectors can simplify the detection pipeline and achieve direct sparse predictions, their performance still lags behind dense detectors with post-processing for 3D object detection from point clouds. DETRs usually adopt a larger number of queries than GTs (e.g., 300 queries v.s. 40 objects in Waymo) in a scene, which inevitably incur many false positives during inference. In this paper, we propose a simple yet effective sparse 3D detector, named Query Contrast Voxel-DETR (ConQueR), to eliminate the challenging false positives, and achieve more accurate and sparser predictions. We observe that most false positives are highly overlapping in local regions, caused by the lack of explicit supervision to discriminate locally similar queries. We thus propose a Query Contrast mechanism to explicitly enhance queries towards their best-matched GTs over all unmatched query predictions. This is achieved by the construction of positive and negative GT-query pairs for each GT, and a contrastive loss to enhance positive GT-query pairs against negative ones based on feature similarities. ConQueR closes the gap of sparse and dense 3D detectors, and reduces up to ~60% false positives. Our single-frame ConQueR achieves new state-of-the-art (sota) 71.6 mAPH/L2 on the challenging Waymo Open Dataset validation set, outperforming previous sota methods (e.g., PV-RCNN++) by over 2.0 mAPH/L2.
翻译:虽然基于 DETTR 的 3D 探测器可以简化探测管道,并实现直接稀少的预测,但其性能仍然落后于使用3D天天体探测的后处理器的密集探测器。 DETTR 通常在现场采用比GT(例如300个查询对Waymo 40个物体)更多的查询,这不可避免地在推断过程中产生许多假的正面效果。在本文中,我们建议建立一个简单而有效的3D 探测器,名为Query Contrast Voxel-DETR(ConQuR),以消除具有挑战性的假阳性,并实现更准确和更稀少的预测。我们观察到,大多数假阳性阳性在本地区域高度重叠,原因是缺乏明确的监督对本地类似查询进行歧视。因此我们提议了一个Query Contrastt 机制,以明确加强对其最匹配的 GTT, 在所有未匹配的查询预测中产生许多假阳性效果。我们通过为每GTT(Query conquerite) 建造正性和负性GT-R 的GT-x-ral-ral-ral-reval-ral-ral-ral-ral-lation-lational-lock-lational-lational-lational-lational-lational-lational-lational-lational-lxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx