We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been produced by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color features for 3D points and a pre-trained text-to-image diffusion model for providing 2D self-supervision. Specifically, we leverage SMPL models to provide rough pose and shape guidance for the generation. We introduce a dual space design that comprises a canonical space and an observation space, which are related by a learnable deformation field through the NeRF, allowing for the transfer of well-optimized texture and geometry from the canonical space to the target posed avatar. Additionally, we exploit a normal-consistency regularization to allow for more vivid generation with detailed geometry and texture. Through extensive evaluations, we demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human generation.
翻译:本文提出了DreamAvatar,一种基于文本和形状引导的框架,用于生成可控姿势的高质量三维人体头像。近期的研究成果已经通过文本引导生成了一些鼓舞人心的三维普通物体,但是由于人体的形状、姿态和外观等复杂性,生成高质量的人体头像仍然是个待解决的难题。我们提出了DreamAvatar来解决这个挑战,它利用可训练的NeRF来预测3D点的密度和颜色特征以及预训练的文本到图像扩散模型来提供二维自监督。具体而言,我们利用SMPL模型为生成提供大致的姿位和形状指导。我们引入一个双空间设计,包括一个规范空间和一个观测空间,这两个空间通过NeRF学习的变形场相互关联,从规范空间向目标设定的头像传输已优化的纹理和几何。此外,我们利用法线一致性约束来实现更加生动、具有详细几何和纹理的生成。通过广泛的评估,我们证明 DreamAvatar 显著优于现有的方法,建立了文本和形状引导的3D人体头像生成的新的最先进技术。