We propose a new method for learning a generalized animatable neural human representation from a sparse set of multi-view imagery of multiple persons. The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control. While existing methods can either generalize to new persons or synthesize animations with user control, none of them can achieve both at the same time. We attribute this accomplishment to the employment of a 3D proxy for a shared multi-person human model, and further the warping of the spaces of different poses to a shared canonical pose space, in which we learn a neural field and predict the person- and pose-dependent deformations, as well as appearance with the features extracted from input images. To cope with the complexity of the large variations in body shapes, poses, and clothing deformations, we design our neural human model with disentangled geometry and appearance. Furthermore, we utilize the image features both at the spatial point and on the surface points of the 3D proxy for predicting person- and pose-dependent properties. Experiments show that our method significantly outperforms the state-of-the-arts on both tasks. The video and code are available at https://talegqz.github.io/neural_novel_actor.
翻译:我们提出一种新的方法,从多人一组稀少的多视图图像中学习一个普遍的神经神经人代表。 学习的人类代表方法可以用来从一组稀少的相机中合成一个任意者的新视觉图像, 并将这些图像与用户的立体控制进一步结合。 虽然现有的方法可以向新人进行概括化, 或者用用户控制合成动画, 但没有一个方法能够同时实现两者。 我们把这个成就归因于使用3D代理器来共享多人模型, 以及进一步将不同形态的空间扭曲到一个共享的罐头姿势空间, 在那里我们学习一个神经字段, 预测人和视形的变形, 以及从输入图像中提取的特征。 为了应对身体形状、 姿势和衣着变的复杂性, 我们设计我们的神经人类模型时, 分解的几何和外观。 此外, 我们使用3D代理的表面图像特征来预测人和依容的特性。 实验显示我们的方法在 MAGI/ 命令 的状态下, 显示我们的方法在 MAVA 的状态 。