Three-dimensional (3D) object detection is essential in autonomous driving. There are observations that multi-modality methods based on both point cloud and imagery features perform only marginally better or sometimes worse than approaches that solely use single-modality point cloud. This paper investigates the reason behind this counter-intuitive phenomenon through a careful comparison between augmentation techniques used by single modality and multi-modality methods. We found that existing augmentations practiced in single-modality detection are equally useful for multi-modality detection. Then we further present a new multi-modality augmentation approach, Multi-mOdality Cut and pAste (MoCa). MoCa boosts detection performance by cutting point cloud and imagery patches of ground-truth objects and pasting them into different scenes in a consistent manner while avoiding collision between objects. We also explore beneficial architecture design and optimization practices in implementing a good multi-modality detector. Without using ensemble of detectors, our multi-modality detector achieves new state-of-the-art performance on nuScenes dataset and competitive performance on KITTI 3D benchmark. Our method also wins the best PKL award in the 3rd nuScenes detection challenge. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.
翻译:三维(3D)天体探测在自主驱动中至关重要。 有观测显示,基于点云和图像特征的多模式方法只使用单一时态云云,其效果比仅仅使用单一时态云的多模式方法更好,有时甚至更差。本文通过仔细比较单一模式和多模式方法使用的增强技术,调查反直觉现象背后的原因。我们发现,在单一时态探测中采用的现有增强方法对多模式检测同样有用。然后我们进一步提出新的多模式增强方法,即多模式断层和pAste(MoCa)。 MoCa通过切割点云和地面光标的图像补丁来提高探测性能,并以一致的方式将其粘贴在不同场面上,同时避免物体之间的碰撞。我们还探索在采用良好的多模式检测器方面采用有益的结构设计和优化做法。我们多模式检测器的多模式检测器不使用检测器,我们的多模式检测器在nuScenes数据设置和 pAST-S 3MMQD 测试中的最佳方法也将在KIMTI/S 3S 测试中赢得最佳标准。