LiDAR segmentation is crucial for autonomous driving perception. Recent trends favor point- or voxel-based methods as they often yield better performance than the traditional range view representation. In this work, we unveil several key factors in building powerful range view models. We observe that the "many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections. We present RangeFormer -- a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing -- that better handles the learning and processing of LiDAR point clouds from the range view. We further introduce a Scalable Training from Range view (STR) strategy that trains on arbitrary low-resolution 2D range images, while still maintaining satisfactory 3D segmentation accuracy. We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.
翻译:LiDAR 分解对于自主驱动感知至关重要。 最近的趋势倾向于基于点或 voxel 的方法, 因为它们通常比传统的范围视图代表法产生更好的性能。 在这项工作中, 我们暴露了在构建强大的范围视图模型方面的几个关键因素。 我们观察到, “ 许多一对一” 映射、 语义不连贯和形状变形可能妨碍从范围映射中有效学习。 我们提出“ 范围 Former ” -- -- 一个全周期框架, 包括跨网络结构的新设计、 数据增强和后处理 -- -- 能够更好地从范围角度处理LIDAR 点云的学习和处理。 我们进一步引入了“ 从范围角度( ST) ” 的可缩放培训战略, 在任意的低分辨率 2D 范围图像上培训, 同时保持令人满意的 3D 分解准确性 。 我们显示, 在相竞争的 LIDAR 语义和视觉分解中, 范围视图方法第一次能够超过点、 voxel 和多视图相配对等基准,, 即 。</s>