Magic3D：高分辨率文本到3D内容创作 (Magic3D: High-Resolution Text-to-3D Content Creation) - 专知论文

会员服务 ·

0

高分辨率 · 3D · 高分辨 · 网格模型 · 网格 ·

2023 年 3 月 25 日

Magic3D: High-Resolution Text-to-3D Content Creation

翻译：Magic3D：高分辨率文本到3D内容创作

Chen-Hsuan Lin,Jun Gao,Luming Tang,Towaki Takikawa,Xiaohui Zeng,Xun Huang,Karsten Kreis,Sanja Fidler,Ming-Yu Liu,Tsung-Yi Lin

from arxiv, Accepted to CVPR 2023 as highlight. Project website: https://research.nvidia.com/labs/dir/magic3d

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

翻译：DreamFusion最近展示了预训练的文本到图像扩散模型优化神经辐射场（NeRF）的实用性，实现了卓越的文本到3D合成结果。然而，该方法存在两个固有限制：(a) NeRF的极慢优化和 (b) 低分辨率图像空间监督NeRF，导致处理时间长、质量低下的3D模型。在这篇论文中，我们通过利用两阶段优化框架来解决这些限制。首先，我们使用低分辨率的扩散先验和稀疏的3D哈希网格结构获取粗略模型。以粗略表示作为初始化，我们进一步优化带纹理的3D网格模型，利用高分辨率潜在扩散模型与高效可微分渲染器交互。我们的方法叫做Magic3D，可以在40分钟内创建高质量的3D网格模型，比DreamFusion（据报道平均需要1.5小时）快2倍，并实现更高分辨率。用户研究表明，61.7%的评价者更喜欢我们的方法。结合图像条件的生成能力，我们为用户提供了控制3D合成的新方法，开启了各种创意应用的新途径。

0

相关内容

高分辨率

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

专知会员服务

22+阅读 · 2022年12月5日

【Google】神经辐射场，Neural Radiance Fields，74页ppt

专知会员服务

74+阅读 · 2021年5月28日

康奈尔大学「深度概率与生成模型」2021SP课程

专知会员服务

49+阅读 · 2021年4月24日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

新智元

1+阅读 · 2022年11月22日

沉浸式体验飞鸟的快乐：从一张照片生成3D航拍视频

沉浸式体验飞鸟的快乐：从一张照片生成3D航拍视频

机器之心

0+阅读 · 2022年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

计算机素描艺术的几何分析与并行化计算

国家自然科学基金

0+阅读 · 2014年12月31日

无人机实时全景遥感成像技术研究

国家自然科学基金

7+阅读 · 2013年12月31日

ATM介导自噬分子Beclin1磷酸化修饰的新功能解析

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间外差光谱仪干涉图分析的目标识别方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

可聚合两亲性大分子和疏水单体原位共聚接枝碳纳米材料及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

DNA复制中Cdc45在染色体上动态行为的新机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于2D视频视觉关注度的3D重建方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

结晶辅助的嵌段共聚物在溶液中的自组装

国家自然科学基金

0+阅读 · 2009年12月31日

用于同位素18O光纤低损耗窗口（1730-1760nm）增益平坦的石英基Tm:Ho共掺光纤放大器研制

国家自然科学基金

0+阅读 · 2008年12月31日

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Arxiv

0+阅读 · 2023年5月16日

GPU-parallelisation of wavelet-based grid adaptation for fast finite volume modelling: application to shallow water flows

Arxiv

0+阅读 · 2023年5月16日

Removing grid structure in angle-resolved photoemission spectra via deep learning method

Arxiv

0+阅读 · 2023年5月15日

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Arxiv

0+阅读 · 2023年5月12日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Arxiv

33+阅读 · 2023年2月18日

Deep Generative Models on 3D Representations: A Survey

Arxiv

15+阅读 · 2022年10月27日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

专知会员服务

22+阅读 · 2022年12月5日

【Google】神经辐射场，Neural Radiance Fields，74页ppt

专知会员服务

74+阅读 · 2021年5月28日

康奈尔大学「深度概率与生成模型」2021SP课程

专知会员服务

49+阅读 · 2021年4月24日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

新智元

1+阅读 · 2022年11月22日

沉浸式体验飞鸟的快乐：从一张照片生成3D航拍视频

沉浸式体验飞鸟的快乐：从一张照片生成3D航拍视频

机器之心

0+阅读 · 2022年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Arxiv

0+阅读 · 2023年5月16日

GPU-parallelisation of wavelet-based grid adaptation for fast finite volume modelling: application to shallow water flows

Arxiv

0+阅读 · 2023年5月16日

Removing grid structure in angle-resolved photoemission spectra via deep learning method

Arxiv

0+阅读 · 2023年5月15日

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Arxiv

0+阅读 · 2023年5月12日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Arxiv

33+阅读 · 2023年2月18日

Deep Generative Models on 3D Representations: A Survey

Arxiv

15+阅读 · 2022年10月27日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

计算机素描艺术的几何分析与并行化计算

国家自然科学基金

0+阅读 · 2014年12月31日

无人机实时全景遥感成像技术研究

国家自然科学基金

7+阅读 · 2013年12月31日

ATM介导自噬分子Beclin1磷酸化修饰的新功能解析

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间外差光谱仪干涉图分析的目标识别方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

可聚合两亲性大分子和疏水单体原位共聚接枝碳纳米材料及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

DNA复制中Cdc45在染色体上动态行为的新机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于2D视频视觉关注度的3D重建方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

结晶辅助的嵌段共聚物在溶液中的自组装

国家自然科学基金

0+阅读 · 2009年12月31日

用于同位素18O光纤低损耗窗口（1730-1760nm）增益平坦的石英基Tm:Ho共掺光纤放大器研制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员