In this work, we propose a novel two-stage framework for the efficient 3D point cloud object detection. Instead of transforming point clouds into 2D bird eye view projections, we parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy. To achieve this goal, we propose dynamic voxelization, a method that voxellizes points at local scale on-the-fly. By doing so, we preserve the point cloud geometry with 3D voxels, and therefore waive the dependence on expensive MLPs to learn from point coordinates. On the other hand, we inherently still follow the same processing pattern as point-wise methods (e.g., PointNet) and no longer suffer from the quantization issue like conventional convolutions. For further speed optimization, we propose the grid-based downsampling and voxelization method, and provide different CUDA implementations to accommodate to the discrepant requirements during training and inference phases. We highlight our efficiency on KITTI 3D object detection dataset with 75 FPS and on Waymo Open dataset with 25 FPS inference speed with satisfactory accuracy.
翻译:在这项工作中,我们为高效的 3D 点云天探测提出了一个新型的两阶段框架。 我们没有将点云转换成 2D 鸟眼视图预测,而是直接分析3D 空间的原始点云数据,但却实现了令人印象深刻的效率和准确性。 为了实现这一目标,我们提出了动态氧化化方法,这是一种以当地规模在飞行上对点进行蒸发的方法。 通过这样做,我们用 3D 氧化物来保护点云的几何测量, 从而放弃对昂贵的 MLPs 的依赖, 从点坐标上学习。 另一方面, 我们本质上仍然遵循与点对点方法相同的处理模式( 例如, 点网), 并且不再像常规的变速法一样受四分化问题的影响。 为了进一步的速度优化, 我们提出基于网格的降压和氧化法方法, 并且提供不同的CUDA 实施方法, 以适应培训和推断阶段的不均匀要求 。 我们强调我们对KITTI 3D 对象探测数据集的效率, 75 FPS 和Waymo Op 数据设置 25 FPS 准确 。