Transfer learning is a proven technique in 2D computer vision to leverage the large amount of data available and achieve high performance with datasets limited in size due to the cost of acquisition or annotation. In 3D, annotation is known to be a costly task; nevertheless, transfer learning methods have only recently been investigated. Unsupervised pre-training has been heavily favored as no very large annotated dataset are available. In this work, we tackle the case of real-time 3D semantic segmentation of sparse outdoor LiDAR scans. Such datasets have been on the rise, but with different label sets even for the same task. In this work, we propose here an intermediate-level label set called the coarse labels, which allows all the data available to be leveraged without any manual labelization. This way, we have access to a larger dataset, alongside a simpler task of semantic segmentation. With it, we introduce a new pre-training task: the coarse label pre-training, also called COLA. We thoroughly analyze the impact of COLA on various datasets and architectures and show that it yields a noticeable performance improvement, especially when the finetuning task has access only to a small dataset.
翻译:在 2D 计算机视野中,转移学习是一种经过实践证明的技术,可以利用大量可用的数据,并在由于购置或注解成本而规模有限的数据集中取得高性能。在 3D 中,注解已知是一项代价高昂的任务;然而,转移学习方法直到最近才得到调查。由于没有非常大的附加说明的数据集,未经监督的训练前的数据集大受青睐。在这项工作中,我们处理的是3D实时三维语系分解分散的户外LIDAR扫描。这种数据集一直在上升,但甚至在同一任务中,标签组也不同。我们在此提议一个中间等级标签组,称为粗度标签,允许在没有人工贴标签的情况下使用所有可用数据。这样,我们就可以进入更大的数据集,同时进行简单的语义分解任务。我们引入了新的培训前任务:粗糙的标签前训练,也称为COLA。我们透彻地分析了COLA 对各种数据集和结构的影响,特别是当它只进行微小的改进时,我们能够产生显著的性能改进。