In recent years, semi-supervised learning has been widely explored and shows excellent data efficiency for 2D data. There is an emerging need to improve data efficiency for 3D tasks due to the scarcity of labeled 3D data. This paper explores how the coherence of different modelities of 3D data (e.g. point cloud, image, and mesh) can be used to improve data efficiency for both 3D classification and retrieval tasks. We propose a novel multimodal semi-supervised learning framework by introducing instance-level consistency constraint and a novel multimodal contrastive prototype (M2CP) loss. The instance-level consistency enforces the network to generate consistent representations for multimodal data of the same object regardless of its modality. The M2CP maintains a multimodal prototype for each class and learns features with small intra-class variations by minimizing the feature distance of each object to its prototype while maximizing the distance to the others. Our proposed framework significantly outperforms all the state-of-the-art counterparts for both classification and retrieval tasks by a large margin on the modelNet10 and ModelNet40 datasets.
翻译:近年来,对半监督学习进行了广泛探讨,并展示了2D数据的数据效率极好。由于标签的3D数据稀少,正在出现提高3D任务数据效率的需要。本文件探讨了如何利用3D数据不同模型(如点云、图像和网目)的一致性来提高3D分类和检索任务的数据效率。我们建议采用实例级一致性限制和新颖的多式对比原型(M2CP)损失,从而建立一个新型的多式半监督学习框架。实例级一致性要求网络为同一对象的多式联运数据生成一致的表示方式。M2CP为每个类别维持一个多式原型,通过尽量减少每个对象与原型的特征距离,同时尽量扩大与其他对象的距离,学习小类内变特点。我们提议的框架大大超越了模型Net10和模型Net40数据集上的所有最先进的对口单位。