3D-CLFusion: 使用对比潜在扩散进行快速文本到3D渲染 (3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion) - 专知论文

会员服务 ·

0

潜在 · FAST · contrastive · Performer · NeRF ·

2023 年 3 月 21 日

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

翻译：3D-CLFusion: 使用对比潜在扩散进行快速文本到3D渲染

Yu-Jhe Li,Kris Kitani

from arxiv, 15 pages. Non-CMU authors are currently hidden due to an internal legal review in progress of their company

We tackle the task of text-to-3D creation with pre-trained latent-based NeRFs (NeRFs that generate 3D objects given input latent code). Recent works such as DreamFusion and Magic3D have shown great success in generating 3D content using NeRFs and text prompts, but the current approach of optimizing a NeRF for every text prompt is 1) extremely time-consuming and 2) often leads to low-resolution outputs. To address these challenges, we propose a novel method named 3D-CLFusion which leverages the pre-trained latent-based NeRFs and performs fast 3D content creation in less than a minute. In particular, we introduce a latent diffusion prior network for learning the w latent from the input CLIP text/image embeddings. This pipeline allows us to produce the w latent without further optimization during inference and the pre-trained NeRF is able to perform multi-view high-resolution 3D synthesis based on the latent. We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code. We demonstrate through experiments the effectiveness of our proposed view-invariant diffusion process for fast text-to-3D creation, e.g., 100 times faster than DreamFusion. We note that our model is able to serve as the role of a plug-and-play tool for text-to-3D with pre-trained NeRFs.

翻译：我们研究了使用预训练基于潜在空间（latent-based）的NeRFs（根据输入潜在代码生成3D对象）进行文本到3D创建的任务。最近的研究如DreamFusion和Magic3D已经展示了在使用NeRF和文本提示生成3D内容方面的巨大成功，但是当前的优化方法对于每个文本提示都非常耗时且往往导致低分辨率输出。为了应对这些挑战，我们提出了一种名为3D-CLFusion的新方法，它利用了预训练基于潜在空间（latent-based）的NeRFs并在不到一分钟的时间内执行快速的3D内容创建。具体而言，我们引入了一个潜在扩散先验网络，用于从CLIP文本/图像嵌入中学习潜在变量w。这个流程允许我们在推理时生成w潜在变量，而预训练的NeRF可以根据潜在变量进行多视图高分辨率3D合成。我们注意到，我们模型的新颖性在于引入了对比学习来训练扩散先验，从而实现了生成有效的视角不变潜在代码。通过实验，我们证明了我们提出的视角不变性扩散过程对于快速的文本到3D创建是有效的，例如，比DreamFusion快100倍。我们注意到，我们的模型能够作为文本到3D预训练NeRF的即插即用工具的角色。

0

相关内容

DiffRec: 扩散推荐模型（SIGIR'23）

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

用于分子Linker设计的等变3D条件扩散模型

用于分子Linker设计的等变3D条件扩散模型

专知会员服务

6+阅读 · 2022年10月24日

GraphBP:3D空间中的靶向分子生成

GraphBP:3D空间中的靶向分子生成

专知会员服务

12+阅读 · 2022年9月27日

【SIGGRAPH 2022】域增强的任意图像风格对比迁移方法

【SIGGRAPH 2022】域增强的任意图像风格对比迁移方法

专知会员服务

26+阅读 · 2022年4月20日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

机器之心

0+阅读 · 2022年11月11日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

机器之心

2+阅读 · 2022年9月30日

NAACL 2022 | 基于Prompt的文本生成迁移学习

NAACL 2022 | 基于Prompt的文本生成迁移学习

PaperWeekly

1+阅读 · 2022年8月31日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

LncRNA介导肿瘤相关巨噬细胞促进乳腺癌转移分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

多色X射线同轴相衬CT射束硬化校正和图像复原算法的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于交互分析的图像与视频快速编辑扩散技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向VEGFR-2的II型小分子抑制剂的设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

VEGFR2靶向超声造影定量分析评价小鼠胰腺癌的抗肿瘤治疗效果的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

纸墨互动模型扩散成像研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于EGFRvIII MR/PET双模态分子成像技术实时评价肿瘤靶向治疗效果的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

DNA-SWNT复合结构力学特性的多尺度粗粒度分子动力学模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

Learning-Free Grasping of Unknown Objects Using Hidden Superquadrics

Arxiv

0+阅读 · 2023年5月11日

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Arxiv

1+阅读 · 2023年5月10日

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Arxiv

1+阅读 · 2023年5月10日

Low-Light Image Enhancement via Structure Modeling and Guidance

Arxiv

0+阅读 · 2023年5月10日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

3D Object Detection for Autonomous Driving: A Survey

Arxiv

12+阅读 · 2021年6月21日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

相关VIP内容

DiffRec: 扩散推荐模型（SIGIR'23）

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

用于分子Linker设计的等变3D条件扩散模型

用于分子Linker设计的等变3D条件扩散模型

专知会员服务

6+阅读 · 2022年10月24日

GraphBP:3D空间中的靶向分子生成

GraphBP:3D空间中的靶向分子生成

专知会员服务

12+阅读 · 2022年9月27日

【SIGGRAPH 2022】域增强的任意图像风格对比迁移方法

【SIGGRAPH 2022】域增强的任意图像风格对比迁移方法

专知会员服务

26+阅读 · 2022年4月20日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

机器之心

0+阅读 · 2022年11月11日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

机器之心

2+阅读 · 2022年9月30日

NAACL 2022 | 基于Prompt的文本生成迁移学习

NAACL 2022 | 基于Prompt的文本生成迁移学习

PaperWeekly

1+阅读 · 2022年8月31日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Learning-Free Grasping of Unknown Objects Using Hidden Superquadrics

Arxiv

0+阅读 · 2023年5月11日

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Arxiv

1+阅读 · 2023年5月10日

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Arxiv

1+阅读 · 2023年5月10日

Low-Light Image Enhancement via Structure Modeling and Guidance

Arxiv

0+阅读 · 2023年5月10日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

3D Object Detection for Autonomous Driving: A Survey

Arxiv

12+阅读 · 2021年6月21日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

相关基金

LncRNA介导肿瘤相关巨噬细胞促进乳腺癌转移分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

多色X射线同轴相衬CT射束硬化校正和图像复原算法的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于交互分析的图像与视频快速编辑扩散技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向VEGFR-2的II型小分子抑制剂的设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

VEGFR2靶向超声造影定量分析评价小鼠胰腺癌的抗肿瘤治疗效果的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

纸墨互动模型扩散成像研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于EGFRvIII MR/PET双模态分子成像技术实时评价肿瘤靶向治疗效果的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

DNA-SWNT复合结构力学特性的多尺度粗粒度分子动力学模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员