Recent advances in zero-shot text-to-3D generation have revolutionized 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent prior view biases in T2I priors. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. To address this fundamental challenge, we propose ConsDreamer, a novel method that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise view control; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer can be seamlessly integrated into various 3D representations and score distillation paradigms, effectively mitigating the multi-face Janus problem.
翻译:零样本文本到3D生成的最新进展通过从文本描述直接合成内容,彻底改变了3D内容创作。尽管现有先进方法利用3D高斯泼溅与分数蒸馏,通过预训练文本到图像(T2I)模型增强多视角渲染,但它们受限于T2I先验中固有的视角偏差。这些偏差导致3D生成不一致,尤其表现为多面Janus问题,即物体在不同视角下呈现冲突特征。为应对这一根本挑战,我们提出ConsDreamer,一种通过优化分数蒸馏过程中的条件项和无条件项来减轻视角偏差的新方法:(1)视角解耦模块(VDM),通过分离无关视角组件并注入精确视角控制,消除条件提示中的视角偏差;(2)基于相似性的偏序损失,通过对齐余弦相似度与方位角关系,在无条件项中强制几何一致性。大量实验表明,ConsDreamer可无缝集成到多种3D表示和分数蒸馏范式中,有效缓解多面Janus问题。