降低3D点云语义分割任务和模型复杂度：Less is More (Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation)

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3x reduction in model parameters and 641x fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).

翻译：尽管近年来3D LiDAR点云数据的可用性显著增长，但注释仍然昂贵且耗时，导致需要半监督语义分割方法应用于自动驾驶等领域。现有工作往往采用相对较大的分割骨干网络来提高分割精度，但是这会牺牲计算成本。此外，许多方法使用均匀采样来减少所需的学习数据，但往往导致次优的性能。为了解决这些问题，我们提出了一种新的管道，采用较小的架构，在需要更少的地面实况注释的情况下实现了优越的分割精度。这是通过一种新的稀疏深度可分离卷积模块实现的，该模块显著减少了网络参数数量，同时保持了整体任务性能。为了有效地对我们的训练数据进行子采样，我们提出了一种新的空间-时间冗余帧降采样(ST-RFD)方法，该方法利用环境中传感器运动的知识来提取更多样化的训练数据帧样本。为了利用有限的注释数据样本，我们进一步提出了一种由LiDAR反射率得出的软伪标签方法。我们的方法在SemanticKITTI(59.5@5%)和ScribbleKITTI(58.1@5%)基准数据集上，使用更少的注释数据，基于2.3倍的模型参数减少和641倍的乘法-加法操作减少，优于当代半监督工作，并且在有限的训练数据上（即Less is More）展现出显著性能提升。