Collecting and labeling the registered 3D point cloud is costly. As a result, 3D resources for training are typically limited in quantity compared to the 2D images counterpart. In this work, we deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via RGB-D images. Specifically, we utilize a strong and well-trained semantic segmentation model for 2D images to augment RGB-D images with pseudo-label. The augmented dataset can then be used to pre-train 3D models. Finally, by simply fine-tuning on a few labeled 3D instances, our method already outperforms existing state-of-the-art that is tailored for 3D label efficiency. We also show that the results of mean-teacher and entropy minimization can be improved by our pre-training, suggesting that the transferred knowledge is helpful in semi-supervised setting. We verify the effectiveness of our approach on two popular 3D models and three different tasks. On ScanNet official evaluation, we establish new state-of-the-art semantic segmentation results on the data-efficient track.
翻译:收集和标签注册的 3D 点云的成本很高。 因此, 与 2D 图像对应方相比, 用于培训的 3D 资源在数量上通常有限。 在这项工作中, 我们通过 RGB- D 图像从强大的 2D 模型传输知识, 应对 3D 任务的数据稀缺挑战。 具体地说, 我们对 2D 图像使用强力和训练有素的语义分解模型, 以假标签增强 RGB- D 图像。 然后, 扩大的数据集可用于 3D 模型。 最后, 简单地微调几个标签的 3D 实例, 我们的方法已经超越了 3D 标签效率所定制的现有艺术状态。 我们还表明, 通过我们的培训前, 中度教师和最小化的结果可以通过我们的培训得到改进, 这表明所转让的知识有助于半监督的设置。 我们核查了我们在两种流行的 3D 模型和三种不同任务上的方法的有效性。 在 ScanNet 官方评估中, 我们建立了新的 。