3D point cloud understanding has made great progress in recent years. However, one major bottleneck is the scarcity of annotated real datasets, especially compared to 2D object detection tasks, since a large amount of labor is involved in annotating the real scans of a scene. A promising solution to this problem is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. This can be achieved by the pre-training and fine-tuning procedure. However, recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective. In particular, we propose to generate random layouts of a scene by making use of the objects in the synthetic CAD dataset and learn the 3D scene representation by applying object-level contrastive learning on two random scenes generated from the same set of synthetic objects. The model pre-trained in this way can serve as a better initialization when later fine-tuning on the 3D object detection task. Empirically, we show consistent improvement in downstream 3D detection tasks on several base models, especially when less training data are used, which strongly demonstrates the effectiveness and generalization of our method. Benefiting from the rich semantic knowledge and diverse objects from synthetic data, our method establishes the new state-of-the-art on widely-used 3D detection benchmarks ScanNetV2 and SUN RGB-D. We expect our attempt to provide a new perspective for bridging object and scene-level 3D understanding.
翻译:3D点云理解近年来取得了巨大进展。 然而,一个主要的瓶颈是缺乏附加说明的真实数据集,特别是相对于2D目标探测任务而言,因为大量劳动力参与对场景的真实扫描。 这个问题的一个有希望的解决办法是更好地利用合成数据集,该数据集由 CAD 对象模型组成,以促进对真实数据集的学习。这可以通过培训前和微调程序实现。然而,最近关于3D培训前的工作显示,在将合成物体学习到其他真实世界应用时,在学习到合成物体方面出现了故障。在这项工作中,我们提出了一种叫做 Random Rooms 的新方法来实现这一目标。特别是,我们提议通过使用合成的 CAD 对象数据集中的物体来生成随机版图,并通过对同一组合成物体产生的两个随机场景进行对比性学习来学习3D场景。 在对3D 目标探测任务进行微调时,在3D 目标目标目标目标探测任务进行新目标测试时,这种模型可以提供更好的初始化。 在3D 实验中,我们利用了多少次深度数据检测,我们用了多少次数据方法, 展示了多少次水平 。