We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. Our method defines the neural scene representation on the mesh surface points and signed distances from the surface of a human body mesh. We identify an indistinguishability issue that arises when a point in 3D space is mapped to its nearest surface point on a mesh for learning surface-aligned neural scene representation. To address this issue, we propose projecting a point onto a mesh surface using a barycentric interpolation with modified vertex normals. Experiments with the ZJU-MoCap and Human3.6M datasets show that our approach achieves a higher quality in a novel-view and novel-pose synthesis than existing methods. We also demonstrate that our method easily supports the control of body shape and clothes.
翻译:我们提出一个新的方法,从稀有多视图 RGB 视频中重建可控制的隐含 3D 人类模型。 我们的方法定义了网状表面的神经场景显示,并且与人体网状表面的距离签了字。 我们确定一个无法区分的问题,即当3D 空间的一个点被映射到其最近的网状的表面点,用于学习与地表一致的神经场场展示。 为了解决这个问题,我们建议用一个带有经修改的脊椎常态的野蛮中心内插,将一个点投到网状表面。 与ZJU-MACP和Human3. 6M 数据集进行的实验表明,我们的方法在新视和新投影合成方面比现有方法质量更高。 我们还表明,我们的方法很容易支持对身体形状和衣服的控制。