CLIP-FO3D:学习2Dense CLIP的免费开放世界3D场景演示</s> (CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP)

Training a 3D scene understanding model requires complicated human annotations, which are laborious to collect and result in a model only encoding close-set object semantics. In contrast, vision-language pre-training models (e.g., CLIP) have shown remarkable open-world reasoning properties. To this end, we propose directly transferring CLIP's feature space to 3D scene understanding model without any form of supervision. We first modify CLIP's input and forwarding process so that it can be adapted to extract dense pixel features for 3D scene contents. We then project multi-view image features to the point cloud and train a 3D scene understanding model with feature distillation. Without any annotations or additional training, our model achieves promising annotation-free semantic segmentation results on open-vocabulary semantics and long-tailed concepts. Besides, serving as a cross-modal pre-training framework, our method can be used to improve data efficiency during fine-tuning. Our model outperforms previous SOTA methods in various zero-shot and data-efficient learning benchmarks. Most importantly, our model successfully inherits CLIP's rich-structured knowledge, allowing 3D scene understanding models to recognize not only object concepts but also open-world semantics.

翻译：培训 3D 场景理解模型需要复杂的人文说明, 需要复杂的人文说明, 需要收集并产生一个只编码近距离天体语义的模型。相反, 视觉语言的训练前模型( 如 CLIP) 展示了显著的开放世界推理属性。为此, 我们提议直接将 CLIP 的特征空间转换为 3D 场景理解模型, 没有任何监督形式。我们首先修改 CLIP 的输入和转发程序, 以便它能够用于提取 3D 场景内容的密度像素特性。我们然后将多视图图像特性投射到点云端, 并用特性蒸馏来训练一个 3D 场景理解模型。最重要的是, 我们的模型在没有任何说明或额外培训的情况下, 在开放式语言语和长的构思概念上, 实现了充满希望的无注释的语义分解结果。此外, 我们的方法可以在微调过程中用来提高数据效率。我们的模型比以前的 SOTA 方法在各种零射镜和数据高效学习基准中超越了先前的方法。。最重要的是, 我们的模型只继承了CLIP 3Slock 也成功继承了光学 3 。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/