DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.
翻译:DreamFusion最近展示了预训练的文本到图像扩散模型优化神经辐射场(NeRF)的实用性,实现了卓越的文本到3D合成结果。然而,该方法存在两个固有限制:(a) NeRF的极慢优化和 (b) 低分辨率图像空间监督NeRF,导致处理时间长、质量低下的3D模型。在这篇论文中,我们通过利用两阶段优化框架来解决这些限制。首先,我们使用低分辨率的扩散先验和稀疏的3D哈希网格结构获取粗略模型。以粗略表示作为初始化,我们进一步优化带纹理的3D网格模型,利用高分辨率潜在扩散模型与高效可微分渲染器交互。我们的方法叫做Magic3D,可以在40分钟内创建高质量的3D网格模型,比DreamFusion(据报道平均需要1.5小时)快2倍,并实现更高分辨率。用户研究表明,61.7%的评价者更喜欢我们的方法。结合图像条件的生成能力,我们为用户提供了控制3D合成的新方法,开启了各种创意应用的新途径。