Point cloud based 3D deep model has wide applications in many applications such as autonomous driving, house robot, and so on. Inspired by the recent prompt learning in natural language processing, this work proposes a novel Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud classification. MvNet investigates the possibility of leveraging the off-the-shelf 2D pre-trained models to achieve the few-shot classification, which can alleviate the over-dependence issue of the existing baseline models towards the large-scale annotated 3D point cloud data. Specifically, MvNet first encodes a 3D point cloud into multi-view image features for a number of different views. Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models. A set of 2D image prompts can then be derived to better describe the suitable prior knowledge for a large-scale pre-trained image model for few-shot 3D point cloud classification. Extensive experiments on ModelNet, ScanObjectNN, and ShapeNet datasets demonstrate that MvNet achieves new state-of-the-art performance for 3D few-shot point cloud image classification. The source code of this work will be available soon.
翻译:----
基于点云的3D深度模型在自动驾驶、家庭机器人等许多应用领域有广泛的应用。受最近在自然语言处理领域中提示学习的启发,本文提出了一种新的多视角视觉提示融合网络(MvNet),用于少样本3D点云分类。MvNet研究了利用现成的2D预训练模型来实现少样本分类的可能性,从而减轻现有基线模型对大规模标注3D点云数据的依赖性问题。具体地,MvNet首先将3D点云编码为多视角图像特征,用于多个不同视角。然后,开发了一种新颖的多视角提示融合模块,有效地融合来自不同视角的信息,以弥合3D点云数据和2D预训练模型之间的差距。接下来,可以派生一组2D图像提示,以更好地描述大规模预训练图像模型的适当先验知识,用于少样本3D点云分类。在 ModelNet、ScanObjectNN 和 ShapeNet 数据集上的大量实验证明,MvNet 实现了3D少样本点云图像分类的新的最先进性能。本文的源代码将很快提供。