3D object detection is a critical task in autonomous driving. Recently multi-modal fusion-based 3D object detection methods, which combine the complementary advantages of LiDAR and camera, have shown great performance improvements over mono-modal methods. However, so far, no methods have attempted to utilize the instance-level contextual image semantics to guide the 3D object detection. In this paper, we propose a simple and effective Painting Adaptive Instance-prior for 3D object detection (PAI3D) to fuse instance-level image semantics flexibly with point cloud features. PAI3D is a multi-modal sequential instance-level fusion framework. It first extracts instance-level semantic information from images, the extracted information, including objects categorical label, point-to-object membership and object position, are then used to augment each LiDAR point in the subsequent 3D detection network to guide and improve detection performance. PAI3D outperforms the state-of-the-art with a large margin on the nuScenes dataset, achieving 71.4 in mAP and 74.2 in NDS on the test split. Our comprehensive experiments show that instance-level image semantics contribute the most to the performance gain, and PAI3D works well with any good-quality instance segmentation models and any modern point cloud 3D encoders, making it a strong candidate for deployment on autonomous vehicles.
翻译:3D 对象探测是自动驱动中的一项关键任务。 最近, 以多式聚变为基础的三维对象探测方法, 结合了LIDAR和相机的互补优势, 展示了与单一模式方法相配合的多式相配的三维对象探测方法, 展示了与单一模式方法相比的显著性能改进。 但是, 到目前为止, 还没有试图使用实例级背景图像语义语义语义表达法来指导 3D 对象探测。 在本文件中, 我们提出了一个简单而有效的3D 对象探测调控程序( PAI3D) 原始级图像语义( 调控点), 使其与点云值特征相配合。 PAI3D 是多式相继的多式相继实级聚合框架。 它首先从图像中提取实例级语义信息, 提取的信息, 包括目标定型标签、 点对点对对象的属性和对象位置位置, 然后用来增强后3D 探测网络的每个立点。 PAI3D 超越了现代艺术状态, 在 NS 数据集中, 在 m4 3D 级中取得了立中, 。