Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: https://fantasia3d.github.io/.
翻译:最近,由于大型语言模型和图像扩散模型的可用性,自动 3D 内容创作取得了快速进展,形成了新兴的文本到 3D 内容创作主题。现有的文本到 3D 方法通常使用隐式场景表示,通过体积渲染将几何和外观相耦合,并且在恢复更精细的几何形状和实现逼真渲染方面存在亚优化问题;因此,它们不太适合生成高质量的 3D 资产。在本文中,我们提出了一种名为 Fantasia3D 的新方法,用于高质量的文本到 3D 内容创建。Fantasia3D 的关键是解开几何和外观的建模和学习。对于几何学习,我们依赖于混合场景表示,并建议将表示提取的表面法线作为图像扩散模型的输入。对于外观建模,我们将空间变化双向反射分布函数(BRDF)引入文本到 3D 任务,并学习表面材料,以便渲染生成表面的逼真效果。我们的解耦框架更符合流行的图形引擎,支持生成的 3D 资产的换光、编辑和物理模拟。我们进行了彻底的实验,证明了我们的方法在不同的文本到 3D 任务设置下比现有方法具有更多优势。项目页面和源代码:https://fantasia3d.github.io/.