This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a dynamic scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce neural blend weight fields to produce the deformation fields. Based on the skeleton-driven deformation, blend weight fields are used with 3D human skeletons to generate observation-to-canonical and canonical-to-observation correspondences. Since 3D human skeletons are more observable, they can regularize the learning of deformation fields. Moreover, the learned blend weight fields can be combined with input skeletal motions to generate new deformation fields to animate the human model. Experiments show that our approach significantly outperforms recent human synthesis methods. The code will be available at https://zju3dv.github.io/animatable_nerf/.
翻译:本文讨论从多视图视频中重建一个可想象的人类模型的挑战。 最近的一些作品提议将一个动态场景分解成一个气态神经光亮场和一组变形场,绘制观测-空间指向气态空间,从而使它们能够从图像中学习动态场景。然而,它们代表变形场,作为翻译矢量场或SE(3)场,使优化受到高度不足的制约。此外,这些表达无法通过输入动作来明确控制。相反,我们引入神经混合重量场以产生变形场。基于骨骼驱动变形,混合重场与3D人类骨骼一起使用,产生观测-对气候和气态观察-观察通信。由于3D人类骨骼更具有可观察性,它们可以规范变形场的学习。此外,学习过的混合重量场可以与输入的骨骼运动结合起来,产生新的变形场,以模拟人类模型。实验显示我们的方法大大超越了最近的人类合成方法。代码将在 http://juz/imabas/tavivolio。