We present neural radiance fields for rendering and temporal (4D) reconstruction of humans in motion (H-NeRF), as captured by a sparse set of cameras or even from a monocular video. Our approach combines ideas from neural scene representation, novel-view synthesis, and implicit statistical geometric human representations, coupled using novel loss functions. Instead of learning a radiance field with a uniform occupancy prior, we constrain it by a structured implicit human body model, represented using signed distance functions. This allows us to robustly fuse information from sparse views and generalize well beyond the poses or views observed in training. Moreover, we apply geometric constraints to co-learn the structure of the observed subject -- including both body and clothing -- and to regularize the radiance field to geometrically plausible solutions. Extensive experiments on multiple datasets demonstrate the robustness and the accuracy of our approach, its generalization capabilities significantly outside a small training set of poses and views, and statistical extrapolation beyond the observed shape.
翻译:我们展示的神经光亮场是人类运动的成形和时间(4D)重建(H-NERF),它被一组稀少的相机或甚至单视视频所捕捉。我们的方法结合了神经场展示、新观点合成和隐含的统计几何人类表现等观点,同时使用新的损失功能。我们没有学习一个在之前使用统一占用的光亮场,而是用结构化的隐含人体模型加以限制,这种模型代表的是签名的距离功能。这使我们能够从稀少的视角中强有力地整合信息,并比在培训中观察到的外观或外观要广泛得多。此外,我们对观测对象的结构 -- -- 包括身体和衣服 -- -- 采用几何等限制,并将光亮场规范为几何貌合理的解决方案。关于多套数据集的广泛实验显示了我们方法的稳健性和准确性,其一般化能力大大超出小型的外观和观点培训组,以及超出观察到的外观的统计外观。