LiDAR semantic segmentation essential for advanced autonomous driving is required to be accurate, fast, and easy-deployed on mobile platforms. Previous point-based or sparse voxel-based methods are far away from real-time applications since time-consuming neighbor searching or sparse 3D convolution are employed. Recent 2D projection-based methods, including range view and multi-view fusion, can run in real time, but suffer from lower accuracy due to information loss during the 2D projection. Besides, to improve the performance, previous methods usually adopt test time augmentation (TTA), which further slows down the inference process. To achieve a better speed-accuracy trade-off, we propose Cascade Point-Grid Fusion Network (CPGNet), which ensures both effectiveness and efficiency mainly by the following two techniques: 1) the novel Point-Grid (PG) fusion block extracts semantic features mainly on the 2D projected grid for efficiency, while summarizes both 2D and 3D features on 3D point for minimal information loss; 2) the proposed transformation consistency loss narrows the gap between the single-time model inference and TTA. The experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that the CPGNet without ensemble models or TTA is comparable with the state-of-the-art RPVNet, while it runs 4.7 times faster.
翻译:高级自主驾驶必备的LiDAR 语义分隔法必须准确、快速和容易地在移动平台上部署,才能准确、快速和方便地在移动平台上部署。先前的点基或稀疏的 voxel 方法与实时应用相距遥远,因为使用了耗时的邻居搜索或稀疏的三维混变。最近的2D 投影方法,包括射程视图和多视图聚合,可以实时运行,但由于2D投影期间信息丢失,其精确度较低。此外,为了提高性能,以往的方法通常采用测试时间增强(TTA),这进一步延缓了推断过程。为了实现更迅速的准确性交易,我们提议采用卡萨卡德点点-Grid Furion网络(CPGNet),主要通过以下两种技术确保效力和效率:(1) 新的点-Grid(PG) 聚变区块主要在2D 预测电网中生成的语义特征,同时总结3D点的2D和3D特征,以最小的信息损失;(2) 拟议的转换一致性损失缩小了它单时间模型和Sen-TIS-C-C-C-C-C-TRA 测试是S-C-C-C-C-TA。