基于多视角扩散先验与高斯泼溅的几何一致文本到三维生成 (Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting)

Score Distillation Sampling (SDS) leverages pretrained 2D diffusion models to advance text-to-3D generation but neglects multi-view correlations, being prone to geometric inconsistencies and multi-face artifacts in the generated 3D content. In this work, we propose Coupled Score Distillation (CSD), a framework that couples multi-view joint distribution priors to ensure geometrically consistent 3D generation while enabling the stable and direct optimization of 3D Gaussian Splatting. Specifically, by reformulating the optimization as a multi-view joint optimization problem, we derive an effective optimization rule that effectively couples multi-view priors to guide optimization across different viewpoints while preserving the diversity of generated 3D assets. Additionally, we propose a framework that directly optimizes 3D Gaussian Splatting (3D-GS) with random initialization to generate geometrically consistent 3D content. We further employ a deformable tetrahedral grid, initialized from 3D-GS and refined through CSD, to produce high-quality, refined meshes. Quantitative and qualitative experimental results demonstrate the efficiency and competitive quality of our approach.

翻译：分数蒸馏采样（SDS）利用预训练的二维扩散模型推动了文本到三维生成的发展，但其忽略了多视角间的关联性，容易导致生成的三维内容出现几何不一致和多面伪影等问题。在本工作中，我们提出了耦合分数蒸馏（CSD）框架，该框架耦合了多视角联合分布先验，以确保生成几何一致的三维内容，同时实现三维高斯泼溅（3D-GS）稳定且直接的优化。具体而言，通过将优化问题重新表述为多视角联合优化问题，我们推导出一种有效的优化规则，该规则能有效耦合多视角先验，以指导不同视角下的优化过程，同时保持生成三维资产的多样性。此外，我们提出了一种框架，可直接对随机初始化的三维高斯泼溅进行优化，以生成几何一致的三维内容。我们进一步采用可变形四面体网格，该网格从三维高斯泼溅初始化，并通过CSD进行细化，以生成高质量、精细化的网格。定量与定性实验结果均证明了我们方法的高效性和具有竞争力的生成质量。