In recent years 3D object detection from LiDAR point clouds has made great progress thanks to the development of deep learning technologies. Although voxel or point based methods are popular in 3D object detection, they usually involve time-consuming operations such as 3D convolutions on voxels or ball query among points, making the resulting network inappropriate for time critical applications. On the other hand, 2D view-based methods feature high computing efficiency while usually obtaining inferior performance than the voxel or point based methods. In this work, we present a real-time view-based single stage 3D object detector, namely CVFNet to fulfill this task. To strengthen the cross-view feature learning under the condition of demanding efficiency, our framework extracts the features of different views and fuses them in an efficient progressive way. We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages. Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view. To better balance the ratio of samples, a sparse pillar detection head is presented to focus the detection on the nonempty grids. We conduct experiments on the popular KITTI and NuScenes benchmark, and state-of-the-art performances are achieved in terms of both accuracy and speed.
翻译:近些年来,通过LiDAR点云的3D天体探测工作取得了巨大进展,这归功于深层学习技术的发展。虽然在3D对象探测中,基于点的方法或基于点的方法很受欢迎,但它们通常涉及耗时的操作,如三维对点或球点查询,使得由此形成的网络不适合时间的关键应用。另一方面,基于2D的视觉方法具有较高的计算效率,而通常的性能低于 voxel 或基于点的方法。在这项工作中,我们提出了一个基于实时视图的单一3D级物体探测器,即用于完成这项任务的CVFNet。为了在要求效率的条件下加强交叉视图特征学习,我们的框架提取了不同观点的特征,并以有效的渐进方式将其连接起来。我们首先提出了一个新型的点-红外特征聚合模块,在多个阶段中深度整合点和范围视图特征。然后,一个特殊的 Slice 支柱用于在将所获得的深点视图特性转换成鸟眼观时保持3D的几何测量。为了更好地平衡样品的准确性比,为了更好地平衡样本和不精确性比,一个分散的界碑标测试标准,我们在非标准级测试中进行。