Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated. Our proposed method, AvatarCraft, addresses this challenge by using diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt. We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. Additionally, we make the human avatar animatable by deforming the neural implicit field with an explicit warping field that maps the target human mesh to a template human mesh, both represented using parametric human models. This simplifies animation and reshaping of the generated avatar by controlling pose and shape parameters. Extensive experiments on various text descriptions show that AvatarCraft is effective and robust in creating human avatars and rendering novel views, poses, and shapes. Our project page is: \url{https://avatar-craft.github.io/}.
翻译:神经隐式场在表示3D场景和生成高质量的新视图方面非常强大,但是使用这样的隐式表示创建具有特定身份和艺术风格并且容易进行动画处理的3D人物头像仍然具有挑战性。我们提出的方法AvatarCraft通过使用扩散模型来指导基于单个文本提示的神经头像的几何和纹理学习来解决这一挑战。我们精心设计了神经隐式场的优化框架,包括粗到细的多包围盒训练策略、形状正则化和基于扩散的约束,以产生高质量的几何和纹理。此外,我们通过使用参数化人体模型来表示目标人体网格和模板人体网格,并通过显式的变形场将神经隐式场变形,使人体头像可动画化,从而简化了生成的头像的动画和形状调整。对不同文本描述的广泛实验表明,AvatarCraft在创建人物头像和渲染新的视图、姿态和形状方面非常有效和稳健。我们的项目页面是:\url{https://avatar-craft.github.io/}。