Acquisition and rendering of photo-realistic human heads is a highly challenging research problem of particular importance for virtual telepresence. Currently, the highest quality is achieved by volumetric approaches trained in a person specific manner on multi-view data. These models better represent fine structure, such as hair, compared to simpler mesh-based models. Volumetric models typically employ a global code to represent facial expressions, such that they can be driven by a small set of animation parameters. While such architectures achieve impressive rendering quality, they can not easily be extended to the multi-identity setting. In this paper, we devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs. We enable generalization across identities by a novel parameterization that combines neural radiance fields with local, pixel-aligned features extracted directly from the inputs, thus sidestepping the need for very deep or complex networks. Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.We demonstrate that our approach outperforms the existing state of the art in terms of quality and is able to generate faithful facial expressions in a multi-identity setting.
翻译:光现实人头的获取和展示对于虚拟远程现场来说是一个极具挑战性的研究问题,对于虚拟现实人头具有特别重要的意义。 目前,通过在多视图数据方面以个人特定方式培训的体积方法,实现了最高质量。 这些模型比简单的网状模型更好地代表细细结构,例如毛发; 量数模型通常使用一种全球代码来代表面部表达, 从而可以由一小套动画参数驱动。 虽然这些结构达到了令人印象深刻的造影质量, 但不容易将其延伸到多特征设置。 在本文中,我们设计了一种新颖的方法来预测人头体容的体积。 我们通过将神经光场与直接从输入中提取的像素异体特征结合起来的新式参数化,使得身份的共通化,从而绕开对非常深或复杂的网络的需求。 我们的方法在端对端方式的培训仅仅基于光度重现损失,而不需要明确的三维监督。 我们证明我们的方法在质量和多面表现中超越了现有状态。