This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce a pose-driven deformation field based on the linear blend skinning algorithm, which combines the blend weight field and the 3D human skeleton to produce observation-to-canonical correspondences. Since 3D human skeletons are more observable, they can regularize the learning of the deformation field. Moreover, the pose-driven deformation field can be controlled by input skeletal motions to generate new deformation fields to animate the canonical human model. Experiments show that our approach significantly outperforms recent human modeling methods. The code is available at https://zju3dv.github.io/animatable_nerf/.
翻译:本文讨论从多视图视频中重建一个可想象的人类模型的挑战。 最近的一些作品提议将一个非硬性变形场分解成一个气球神经光亮场和一组变形场,绘制观测-空间指向气球空间,从而使它们能够从图像中学习动态场景。然而,它们代表变形场,作为翻译矢量场或SE(3)场,使优化受到高度不受控制。此外,这些表情无法明确地通过输入动作来控制。相反,我们引入了一个基于线性混合皮革算法的成形变形场,该算法将混合重场和3D人类骨架结合起来,以产生观测-摄像学对应。由于3D人类骨骼更具有观察性,因此它们可以对变形场的学习进行规范化。此外,由外形驱动变形场可以通过输入的骨骼运动来控制,以产生新的变形场景为神经模型。 实验显示,我们的方法明显超越了最近的人类模型方法。 代码可以在 https://juz/imaviod.