In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.
翻译:在这项工作中,我们研究了仅使用单张图像创建高保真 3D 模型的问题。这本质上是具有挑战性的:它需要同时估计潜在的 3D 几何形状,同时进行未见过纹理的幻觉。为了解决这个难题,我们利用预先训练的 2D 扩散模型的先验知识,作为 3D 模型创建的 3D 意识监督。我们的方法 Make-It-3D,采用两阶段优化流程:第一阶段通过在前方视图中合并参考图像和在新视图中使用扩散先验来优化神经辐射场;第二阶段将粗略模型转换为带纹理的点云,并在利用参考图像的高质量纹理的同时,进一步使用扩散先验来提高真实感。大量的实验表明,我们的方法超过了先前的工作很多,产生了忠实的重建结果和令人印象深刻的视觉品质。我们的方法是首次尝试从单张图像中为通用对象创建高质量 3D 模型,并能够实现文本到 3D 创建和纹理编辑等各种应用。