Transfer learning is a proven technique in 2D computer vision to leverage the large amount of data available and achieve high performance with datasets limited in size due to the cost of acquisition or annotation. In 3D, annotation is known to be a costly task; nevertheless, pre-training methods have only recently been investigated. Due to this cost, unsupervised pre-training has been heavily favored. In this work, we tackle the case of real-time 3D semantic segmentation of sparse autonomous driving LiDAR scans. Such datasets have been increasingly released, but each has a unique label set. We propose here an intermediate-level label set called coarse labels, which can easily be used on any existing and future autonomous driving datasets, thus allowing all the data available to be leveraged at once without any additional manual labeling. This way, we have access to a larger dataset, alongside a simple task of semantic segmentation. With it, we introduce a new pre-training task: coarse label pre-training, also called COLA. We thoroughly analyze the impact of COLA on various datasets and architectures and show that it yields a noticeable performance improvement, especially when only a small dataset is available for the finetuning task.
翻译:转移学习是 2D 计算机视觉中的一种成熟技术,利用大量的可用数据并在数据受限的情况下实现高性能,这是由于获取或标注成本的限制。在 3D 中,标注被认为是一项昂贵的任务;尽管如此,预训练方法最近才受到了研究者的研究。由于成本的限制,无监督预训练一直受到人们的青睐。在本工作中,我们攻克了实时 3D 自动驾驶 LiDAR 扫描的稀疏语义分割的案例。这种数据集已经越来越多地被释放,但每个数据集都有一个唯一的标签集。我们在这里提出了一种名为粗标签的中间级标签集,可轻松应用于任何现有和未来的自动驾驶数据集,从而在不需任何额外手动标注的情况下立即利用所有现有数据。这样,我们可以访问一个更大的数据集,并且可以使用简单的语义分割任务。在此基础上,我们介绍了一个新的预训练任务:粗标签预训练(COLA)。我们深入分析了 COLA 对各种数据集和体系结构的影响,并表明它在微调任务仅有少量数据可用时产生明显的性能提升。