We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.
翻译:我们提出了DreamBooth3D,一种个性化文本生成三维模型的方法。该方法可以使用仅有3-6张任意拍摄的主体图像进行个性化操作。我们的方法通过结合最近在个性化文本生成图像模型(DreamBooth)和文本生成三维模型(DreamFusion)领域的进展来实现。我们发现,单纯的将这些方法结合起来无法获得令人满意的主体特定三维资产,因为个性化文本生成图像模型过度拟合了主体的输入视角。我们通过一种三阶段优化策略来克服这一问题,该策略集体利用神经辐射场的三维一致性和文本生成图像模型的个性化能力。我们的方法可以产生高质量的,主体特定的三维资产,并对颜色、属性等进行文本驱动的修改,这些修改在主体的任何输入图像中都没有看到。