The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e.g. point clouds) are notoriously hard. For example, the number of scenes (e.g. indoor rooms) that can be accessed and scanned might be limited; even given sufficient data, acquiring 3D labels (e.g. instance masks) requires intensive human labor. In this paper, we explore data-efficient learning for 3D point cloud. As a first step towards this direction, we propose Contrastive Scene Contexts, a 3D pre-training method that makes use of both point-level correspondences and spatial contexts in a scene. Our method achieves state-of-the-art results on a suite of benchmarks where training data or labels are scarce. Our study reveals that exhaustive labelling of 3D point clouds might be unnecessary; and remarkably, on ScanNet, even using 0.1% of point labels, we still achieve 89% (instance segmentation) and 96% (semantic segmentation) of the baseline performance that uses full annotations.
翻译:3D场景理解的快速进展随着对数据需求的不断增长而来;然而,对3D场景(如点云)的收集和批注十分困难。例如,可以访问和扫描的场景(如室内房间)数量可能有限;即使有足够的数据,获得3D标签(如掩体面具)也需要大量人力劳动。在本文中,我们探索3D点云的数据效率学习。作为朝这个方向迈出的第一步,我们提出了3D场景的对比性场景背景(即点云),即3D前培训方法,既利用点级通信,又利用场景的空间背景。我们的方法在培训数据或标签稀缺的一组基准上取得了最先进的结果。我们的研究显示,没有必要对3D点云作详尽的标签;显然,在扫描网上,即使使用0.1%的点标点标签,我们仍能达到89%(隔离)和96%(分层)的基准业绩,并使用完整的说明。