In autonomous driving, 3D object detection based on multi-modal data has become an indispensable approach when facing complex environments around the vehicle. During multi-modal detection, LiDAR and camera are simultaneously applied for capturing and modeling. However, due to the intrinsic discrepancies between the LiDAR point and camera image, the fusion of the data for object detection encounters a series of problems. Most multi-modal detection methods perform even worse than LiDAR-only methods. In this investigation, we propose a method named PTA-Det to improve the performance of multi-modal detection. Accompanied by PTA-Det, a Pseudo Point Cloud Generation Network is proposed, which can convert image information including texture and semantic features by pseudo points. Thereafter, through a transformer-based Point Fusion Transition (PFT) module, the features of LiDAR points and pseudo points from image can be deeply fused under a unified point-based representation. The combination of these modules can conquer the major obstacle in feature fusion across modalities and realizes a complementary and discriminative representation for proposal generation. Extensive experiments on the KITTI dataset show the PTA-Det achieves a competitive result and support its effectiveness.
翻译:在自动驾驶中,以多式数据为基础的三维天体探测在面对车辆周围复杂环境时已成为一个不可或缺的方法。在多式探测中,LiDAR和相机同时用于捕获和建模,但由于LIDAR点和相机图像之间的内在差异,物体探测数据的融合遇到一系列问题。大多数多式探测方法比LIDAR唯一的方法更差。在本次调查中,我们提议了一个名为PTA-Det的方法,以改善多式探测的性能。在PTA-Det的配合下,建议建立一个Pseudo点云生成网络,该网络可以转换图像信息,包括假点的纹质和语义特征。此后,通过基于变压器的点融合过渡模块,LIDAR点和图像的假点特征可以在统一点代表下大大融合。这些模块的结合可以克服不同模式特征融合的主要障碍,并为投标书生成实现互补和有区别的表述。关于KITTI数据的竞争性结果的大规模实验将显示其竞争性结果。