3D点云预培训,从 2D 图像中进行知识蒸馏 (3D Point Cloud Pre-training with Knowledge Distillation from 2D Images)

The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets. However, compared with 2D image datasets, the current pre-training data of 3D point cloud is limited. To overcome this limitation, we propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model, particularly the image encoder of CLIP, through concept alignment. Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images. In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models. Extensive experiments demonstrate that the proposed knowledge distillation scheme achieves higher accuracy than the state-of-the-art 3D pre-training methods for synthetic and real-world datasets on downstream tasks, including object classification, object detection, semantic segmentation, and part segmentation.

翻译：最近训练前的2D视觉模型的成功主要归功于从大型数据集中学习。然而,与2D图像数据集相比,目前3D点云的训练前数据有限。为了克服这一限制,我们提议为3D点云的训练前模型采用一种知识蒸馏方法,通过概念一致性,直接从2D代表性学习模型获得知识,特别是CLIP的图像编码器。具体地说,我们引入了一个交叉注意机制,从 3D点云中提取概念特征,并将其与2D 图像的语义信息进行比较。在这个办法中,经过训练的点云预先模型直接从2D 教师模型中的丰富信息中学习。广泛的实验表明,拟议的知识蒸馏计划比关于下游任务的合成和真实世界数据集(包括物体分类、物体探测、语义分割和部分分割)的3D前培训方法更加精确。

相关内容

点云

关注 48

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【KDD2020教程】多模态网络表示学习

专知会员服务

132+阅读 · 2020年8月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日