Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform quite well. However, their performances of detecting small objects or under hard situations did not surpass those of the fusion-based ones due to failure to leverage the image semantics. In order to elevate the detection performance in a complicated environment, this paper proposes a deep learning (DL)-embedded fusion-based multi-class 3D object detection network which admits both LiDAR and camera sensor data streams, named Voxel-Pixel Fusion Network (VPFNet). Inside this network, a key novel component is called Voxel-Pixel Fusion (VPF) layer, which takes advantage of the geometric relation of a voxel-pixel pair and fuses the voxel features and the pixel features with proper mechanisms. Moreover, several parameters are particularly designed to guide and enhance the fusion effect after considering the characteristics of a voxel-pixel pair. Finally, the proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty, and is shown to outperform all state-of-the-art methods in mean average precision (mAP). It is also noteworthy that our approach here ranks the first on the KITTI leaderboard for the challenging pedestrian class.
翻译:许多基于激光雷达的大型天体探测方法、单级天体探测方法,或者在容易的情况下,据说效果相当好。然而,由于未能利用图像语义学,它们探测小天体的性能或者在硬性情况下的性能并没有超过以聚变为基础的天体的性能。为了提高在复杂环境中的探测性能,本文件提议了一个深层次学习(DL)的聚变多级三维天体探测网络,其中既包括立体雷达数据流,也包括摄影机传感器数据流(VPFNet ) 。在这个网络中,一个称为Voxel-比素融合(VPFF)层的关键新构件,它利用了Voxel-比素(VPF)层的几何关系,将氧化物特征和像素特性与适当的机制结合起来。此外,若干参数特别设计来指导和加强聚合效应,因为考虑到福克斯比子探测器和照相机传感器传感器数据流的特性。最后,在这个网络中,一个称为VFIPTI基准的关键新构件组件组件,称为Vxel-Pixel-Pixel Fusion(VPixel) II(VPFPISion) (VPF) (VPFPFPFPF) (VPFPF) (VPFPFPFPF) 层) 层,它利用了VFPFS) (VPFS) (VPF) (VPF) (VPF) (VP) (VP) (VP) (VFS) (VP) (VP) (VPF) (VF) (VPF) (VPF) (VPF) (VPF) (VPF) (VPF) (VPF) (VP) (VP) (VPF) (VP) (VP) (VPF) (VPF) (VPF) (VPF) (VPF) (VP) (VP) (VF) (VPF) (VP) (VP) (VPF) (VPF) (VPF) (VPF) (VPF) (VPF) (VP) (VP) (V