3D scene understanding is a critical yet challenging task in autonomous driving due to the irregularity and sparsity of LiDAR data, as well as the computational demands of processing large-scale point clouds. Recent methods leverage range-view representations to enhance efficiency, but they often adopt higher azimuth resolutions to mitigate information loss during spherical projection, where only the closest point is retained for each 2D grid. However, processing wide panoramic range-view images remains inefficient and may introduce additional distortions. Our empirical analysis shows that training with multiple range images, obtained from splitting the full point cloud, improves both segmentation accuracy and computational efficiency. However, this approach also poses new challenges of exacerbated class imbalance and increase in projection artifacts. To address these, we introduce FLARES, a novel training paradigm that incorporates two tailored data augmentation techniques and a specialized post-processing method designed for multi-range settings. Extensive experiments demonstrate that FLARES is highly generalizable across different architectures, yielding 2.1%~7.9% mIoU improvements on SemanticKITTI and 1.8%~3.9% mIoU on nuScenes, while delivering over 40% speed-up in inference.
翻译:暂无翻译