Dream3D: 基于3D形状先验和文本到图像扩散模型的零样本文本到3D合成 (Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models) - 专知论文

会员服务 ·

0

形状先验 · 3D · 扩散模型 · 零样本 · 合成 ·

2023 年 4 月 3 日

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

翻译：Dream3D: 基于3D形状先验和文本到图像扩散模型的零样本文本到3D合成

Jiale Xu,Xintao Wang,Weihao Cheng,Yan-Pei Cao,Ying Shan,Xiaohu Qie,Shenghua Gao

from arxiv, Accepted by CVPR 2023. Project page: https://bluestyle97.github.io/dream3d/

Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF, have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods.

翻译：最近，基于CLIP的3D优化方法，如DreamFields和PureCLIPNeRF，在零样本文本到3D合成方面取得了令人印象深刻的成果。然而，由于需要从头开始进行训练和随机初始化，没有先验知识，这些方法经常无法生成符合输入文本的精确和可信的3D结构。在本文中，我们首次尝试将显式的3D形状先验引入到基于CLIP的3D优化过程中。具体而言，我们首先从文本中生成一个高质量的3D形状，作为3D形状先验的文本到形状阶段。然后我们将其用作神经辐射场的初始化，并使用完整提示进行优化。为了解决具有挑战性的文本到形状生成任务，我们提出了一种简单而有效的方法，该方法直接利用强大的文本到图像扩散模型桥接文本和图像模态。为了缩小文本到图像扩散模型合成的图像与图像到形状生成器用于训练的形状渲染之间的风格域差距，我们进一步建议联合优化可学习的文本提示，并微调文本到图像扩散模型以进行渲染样式图像生成。和现有最优秀方法相比，我们的方法Dream3D能够生成具有卓越的视觉质量和形状准确性的想象性3D内容。

0

相关内容

形状先验

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

专知会员服务

22+阅读 · 2022年12月5日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

专知会员服务

24+阅读 · 2022年3月1日

基于预训练语言模型的文本生成

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

一句话生成3D模型：AI扩散模型的突破，让建模师慌了

一句话生成3D模型：AI扩散模型的突破，让建模师慌了

机器之心

2+阅读 · 2022年11月23日

英伟达「一句话生成3D模型」碾压谷歌：分辨率清晰8倍，速度快2倍，编辑文本还可直接修改

英伟达「一句话生成3D模型」碾压谷歌：分辨率清晰8倍，速度快2倍，编辑文本还可直接修改

量子位

1+阅读 · 2022年11月22日

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

新智元

1+阅读 · 2022年11月22日

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

机器之心

0+阅读 · 2022年11月11日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

量子位

0+阅读 · 2022年10月4日

Text-to-3D！建筑学小哥自称编程菜鸟，攒了个AI作画三维版，还是彩色的

Text-to-3D！建筑学小哥自称编程菜鸟，攒了个AI作画三维版，还是彩色的

量子位

1+阅读 · 2022年9月25日

消费级GPU可用，文本转图像开源新模型Stable Diffusion生成宇宙变迁大片

消费级GPU可用，文本转图像开源新模型Stable Diffusion生成宇宙变迁大片

机器之心

0+阅读 · 2022年8月16日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

结构化矢量图的模式样本合成与操控

国家自然科学基金

0+阅读 · 2013年12月31日

IL-38对吸烟诱导的慢性阻塞性肺病(COPD)的免疫调控作用

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

面向敏感网络图像过滤的原生数字图像文本提取关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向低成本便携扫描设备的三维建模与编辑技术

国家自然科学基金

3+阅读 · 2012年12月31日

可编辑三维电视中多视点视图合成理论与方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于超多视角成像的三维重建关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于公式的数学搜索引擎的研究与开发

国家自然科学基金

0+阅读 · 2009年12月31日

类固醇激素合成急性调节蛋白（StAR）在血管内皮细胞脂质代谢平衡中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于目标区域演化分割的图像描述研究

国家自然科学基金

0+阅读 · 2008年12月31日

Deepfake Text Detection in the Wild

Arxiv

0+阅读 · 2023年5月22日

The CLIP Model is Secretly an Image-to-Prompt Converter

Arxiv

0+阅读 · 2023年5月22日

HoloDiffusion: Training a 3D Diffusion Model using 2D Images

Arxiv

0+阅读 · 2023年5月21日

Deep Image Compression Using Scene Text Quality Assessment

Arxiv

0+阅读 · 2023年5月19日

RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture

Arxiv

0+阅读 · 2023年5月18日

TextDiffuser: Diffusion Models as Text Painters

Arxiv

0+阅读 · 2023年5月18日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

Image-to-Image Translation: Methods and Applications

Arxiv

17+阅读 · 2021年1月21日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

专知会员服务

22+阅读 · 2022年12月5日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

专知会员服务

24+阅读 · 2022年3月1日

基于预训练语言模型的文本生成

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

一句话生成3D模型：AI扩散模型的突破，让建模师慌了

一句话生成3D模型：AI扩散模型的突破，让建模师慌了

机器之心

2+阅读 · 2022年11月23日

英伟达「一句话生成3D模型」碾压谷歌：分辨率清晰8倍，速度快2倍，编辑文本还可直接修改

英伟达「一句话生成3D模型」碾压谷歌：分辨率清晰8倍，速度快2倍，编辑文本还可直接修改

量子位

1+阅读 · 2022年11月22日

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

新智元

1+阅读 · 2022年11月22日

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

只需3个样本一句话，AI就能定制照片级图像，谷歌在玩一种很新的扩散模型

机器之心

0+阅读 · 2022年11月11日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

量子位

0+阅读 · 2022年10月4日

Text-to-3D！建筑学小哥自称编程菜鸟，攒了个AI作画三维版，还是彩色的

Text-to-3D！建筑学小哥自称编程菜鸟，攒了个AI作画三维版，还是彩色的

量子位

1+阅读 · 2022年9月25日

消费级GPU可用，文本转图像开源新模型Stable Diffusion生成宇宙变迁大片

消费级GPU可用，文本转图像开源新模型Stable Diffusion生成宇宙变迁大片

机器之心

0+阅读 · 2022年8月16日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Deepfake Text Detection in the Wild

Arxiv

0+阅读 · 2023年5月22日

The CLIP Model is Secretly an Image-to-Prompt Converter

Arxiv

0+阅读 · 2023年5月22日

HoloDiffusion: Training a 3D Diffusion Model using 2D Images

Arxiv

0+阅读 · 2023年5月21日

Deep Image Compression Using Scene Text Quality Assessment

Arxiv

0+阅读 · 2023年5月19日

RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture

Arxiv

0+阅读 · 2023年5月18日

TextDiffuser: Diffusion Models as Text Painters

Arxiv

0+阅读 · 2023年5月18日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

Image-to-Image Translation: Methods and Applications

Arxiv

17+阅读 · 2021年1月21日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

相关基金

结构化矢量图的模式样本合成与操控

国家自然科学基金

0+阅读 · 2013年12月31日

IL-38对吸烟诱导的慢性阻塞性肺病(COPD)的免疫调控作用

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

面向敏感网络图像过滤的原生数字图像文本提取关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向低成本便携扫描设备的三维建模与编辑技术

国家自然科学基金

3+阅读 · 2012年12月31日

可编辑三维电视中多视点视图合成理论与方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于超多视角成像的三维重建关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于公式的数学搜索引擎的研究与开发

国家自然科学基金

0+阅读 · 2009年12月31日

类固醇激素合成急性调节蛋白（StAR）在血管内皮细胞脂质代谢平衡中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于目标区域演化分割的图像描述研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员