We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.
翻译:我们研究了大型 3D 点云的高效语义分割问题。 通过依赖昂贵的取样技术或计算上重的预处理/后处理步骤,大多数现有方法只能经过小规模点云的培训和操作。 在本文中,我们引入了高效和轻量度神经结构RandLA-Net,以直接推导大型点云的点语义。我们的方法的关键是使用随机点取样,而不是更为复杂的点选择方法。尽管计算和记忆效率很高,随机取样可以偶然地丢弃关键特征。为了克服这一困难,我们引入了一个新的本地特征汇总模块,以逐步增加每个3D点的接收场,从而有效地保存几何细节。广泛的实验表明,我们的RandLA-Net可以在单关口处理100万个点,比现有方法更快200X。此外,我们的RandLA-Net显然超越了在两个大型标准Smantic3D和SmanticITTI上的语义分割方法。