We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.
翻译:我们引入了自由视点转换方法 -- -- HumanNeRF -- -- 用于制作人类演化复杂的身体动作的单个视频,例如YouTube的视频。我们的方法使视频能够在任何框中插入视频,并用任意的新的相机视图或甚至完全的360度的相机路径将主题从该框和身体布局上呈现出来。这个任务特别具有挑战性,因为它需要综合身体的光现实细节,从输入视频中可能不存在的各种摄像角度来看,以及综合布折和面部外观等细小细节。我们的方法优化了一个人在罐形T位置上的体积代表,同时绘制了通过后向曲盘绘制视频每个框的估计罐形代表的动态场。运动场被分解成由深层网络产生的骨骼硬和非硬性动作。我们展示了与先前工作相比显著的性能改进,以及从单形图像中自由视点显示人类在挑战不受控制的捕捉场情景中移动的明显例子。