Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training technique for better model initialization by utilizing the multi-view rendering of the 3D data. Our pre-training is self-supervised by a local pixel/point level correspondence loss computed from perspective projection and a global image/point cloud level loss based on knowledge distillation, thus effectively improving upon popular point cloud networks, including PointNet, DGCNN and SR-UNet. These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks. We also analyze the benefits of synthetic and real data for pre-training, and observe that pre-training on synthetic data is also useful for high-level downstream tasks. Code and pre-trained models are available at https://github.com/VinAIResearch/selfsup_pcd.
翻译:最近,随着专门为3D点云设计的深层神经网络的出现,在3D深层学习方面取得了巨大进展,这些网络往往从零到零或从纯从点云数据学的预培训模型接受培训,在图像领域深层学习的成功激励下,我们设计了一种新的培训前技术,利用3D数据的多视图资料更好地进行模型初始化。我们的预培训由从角度预测的当地像素/点级通信损失和基于知识蒸馏的全球图像/点云水平损失自我监督,从而有效地改进流行点云网络,包括点网、DGCNN和SR-UNet。这些改进后模型优于各种数据集和下游任务的现有最新方法。我们还分析了合成和真实数据在培训前的惠益,并注意到合成数据预培训对高级别下游任务也有用。可在https://github.com/VinAIResearch/selfsup_pd查阅守则和预培训模式。