Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement.Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.
翻译:随着文本到图像扩散模型的演进,文本到3D生成领域已取得显著进展。当前该领域主要存在两种范式:前馈生成方案能快速生成3D资产但通常结果较为粗糙,以及基于分数蒸馏采样(SDS)的方案虽生成高保真3D资产但速度较慢。这两种方法的协同整合对推进3D生成技术具有重要潜力。本文提出BoostDream——一种高效的即插即用式3D优化方法,旨在将粗糙3D资产转化为高质量成果。BoostDream框架包含三个独立流程:(1)我们引入3D模型蒸馏技术,从前馈生成获得的3D资产中拟合可微分表征;(2)设计了一种新颖的多视角SDS损失函数,利用多视角感知的2D扩散模型优化3D资产;(3)提出在优化过程中使用提示词与多视角一致的法线贴图作为引导。我们在不同可微分3D表征上进行了大量实验,结果表明BoostDream能快速生成高质量3D资产,且相较于传统SDS方法有效缓解了Janus问题。这一突破标志着3D生成过程在效率与质量上的实质性进步。