This paper presents a neural rendering method for controllable portrait video synthesis. Recent advances in volumetric neural rendering, such as neural radiance fields (NeRF), has enabled the photorealistic novel view synthesis of static scenes with impressive results. However, modeling dynamic and controllable objects as part of a scene with such scene representations is still challenging. In this work, we design a system that enables both novel view synthesis for portrait video, including the human subject and the scene background, and explicit control of the facial expressions through a low-dimensional expression representation. We leverage the expression space of a 3D morphable face model (3DMM) to represent the distribution of human facial expressions, and use it to condition the NeRF volumetric function. Furthermore, we impose a spatial prior brought by 3DMM fitting to guide the network to learn disentangled control for scene appearance and facial actions. We demonstrate the effectiveness of our method on free view synthesis of portrait videos with expression controls. To train a scene, our method only requires a short video of a subject captured by a mobile device.
翻译:本文展示了可控肖像图像合成的神经化转换方法。 最近在神经弧度场( NERF) 等体积神经化合成方面的进步, 使得对静态场景进行摄影现实化的新视角合成, 取得了令人印象深刻的结果。 然而, 模拟动态和可控物体作为场景展示的一部分, 仍然具有挑战性。 在这项工作中, 我们设计了一个系统, 既能为肖像视频进行新颖的视图合成, 包括人体主题和场景背景, 也能通过低维表达式表达式对面部表达式进行明确控制。 我们利用3D可变形面部模型( 3DMM) 的表达空间来代表人类面部表达式的分布, 也能用它来调节 NERF 体积功能。 此外, 我们设置了由 3DMM 带来的空间, 来引导网络学习对场景和面部动作进行分解的控制。 我们展示了我们用表达器对肖像图像进行自由合成的方法的有效性。 为了训练一个场景, 我们的方法只需要用移动设备拍摄一个主题的短视频。