While deep learning reshaped the classical motion capture pipeline with feed-forward networks, generative models are required to recover fine alignment via iterative refinement. Unfortunately, the existing models are usually hand-crafted or learned in controlled conditions, only applicable to limited domains. We propose a method to learn a generative neural body model from unlabelled monocular videos by extending Neural Radiance Fields (NeRFs). We equip them with a skeleton to apply to time-varying and articulated motion. A key insight is that implicit models require the inverse of the forward kinematics used in explicit surface models. Our reparameterization defines spatial latent variables relative to the pose of body parts and thereby overcomes ill-posed inverse operations with an overparameterization. This enables learning volumetric body shape and appearance from scratch while jointly refining the articulated pose; all without ground truth labels for appearance, pose, or 3D shape on the input videos. When used for novel-view-synthesis and motion capture, our neural model improves accuracy on diverse datasets. Project website: https://lemonatsu.github.io/anerf/ .
翻译:深层次的学习改造了传统运动捕捉管道,并配有进取网路,但需要基因模型,才能通过迭代改进恢复细细的调整。不幸的是,现有的模型通常是手工制作的,或是在控制条件下学习的,只适用于有限的领域。我们建议了一种方法,通过扩展神经辐射场(NERFs),从无标签的单视视频中学习基因神经体模型。我们为它们配备了一个骨架,以适用于时间的推移和表达运动。一个关键的洞察力是,隐含的模型需要在显眼表面模型中使用的远向运动模型的反向。我们的重新测量法界定了与身体部分形状相对的空间潜伏变量,从而克服了反向的反向操作,而以过分度法来克服了错误的反向操作。这样可以学习体体形和外貌,同时共同改进外观;在输入的视频中,所有外观、外观或外观或3D形状都没有地面的标签。当用于新视合成和动作捕捉时,我们的神经模型可以提高不同数据集的准确性。项目网站:http://lemontsuction.githububio/aner/fer/ff。