This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce neural blend weight fields to produce the deformation fields. Based on the skeleton-driven deformation, blend weight fields are used with 3D human skeletons to generate observation-to-canonical and canonical-to-observation correspondences. Since 3D human skeletons are more observable, they can regularize the learning of deformation fields. Moreover, the learned blend weight fields can be combined with input skeletal motions to generate new deformation fields to animate the human model. Experiments show that our approach significantly outperforms recent human synthesis methods. The code and supplementary materials are available at https://zju3dv.github.io/animatable_nerf/.
翻译:本文讨论从多视图视频中重建一个可想象的人类模型的挑战。 最近的一些作品建议将非硬性变形场分解成一个气态神经光亮场和一组变形场,绘制观测空间的分布到气态空间,从而使它们能够从图像中学习动态场景。然而,它们代表变形场,作为翻译矢量场或SE(3)场,使优化受到高度不受控制。此外,这些表述无法通过输入动作来明确控制。相反,我们引入神经混合重场来生成变形场。根据骨骼驱动变形,混合重场与3D人类骨骼一起用于生成观测到气态和气态观察空间的对应体。由于3D人类骨骼更具有观察性,它们可以规范变形场的学习。此外,学习过的混合重场可以与输入的骨骼运动相结合,以生成新的变形场域,以生成变形模型。 实验显示,我们的方法大大超出骨质变形。 在 http://juma/ 补充性材料中可以使用。