Photo-realistic video portrait reenactment benefits virtual production and numerous VR/AR experiences. The task remains challenging as the reenacted expression should match the source while the lighting should be adjustable to new environments. We present a neural relighting and expression transfer technique to transfer the facial expressions from a source performer to a portrait video of a target performer while enabling dynamic relighting. Our approach employs 4D reflectance field learning, model-based facial performance capture and target-aware neural rendering. Specifically, given a short sequence of the target performer's OLAT, we apply a rendering-to-video translation network to first synthesize the OLAT result of new sequences with unseen expressions. We then design a semantic-aware facial normalization scheme along with a multi-frame multi-task learning strategy to encode the content, segmentation, and motion flows for reliably inferring the reflectance field. This allows us to simultaneously control facial expression and apply virtual relighting. Extensive experiments demonstrate that our technique can robustly handle challenging expressions and lighting environments and produce results at a cinematographic quality.
翻译:照片真实的视频图像重现会有利于虚拟制作和许多 VR/AR 经验。 任务仍然艰巨, 因为重新激活的表达式应该与源代码匹配, 而照明应该适应新的环境。 我们展示了一种神经照明和表达传输技术, 将面部表达式从源显示器转换为目标表演者的肖像视频, 同时能够进行动态重现。 我们的方法使用了 4D 反射现场学习、 模型面部表现捕捉和目标感应神经造影。 具体地说, 鉴于目标执行者的 OLAT 的短顺序, 我们应用一个图像到视频的翻译网络, 首次将新序列的 OLAT 结果与看不见的表达式合成。 我们随后设计了一种语义认知面部正常化方案, 以及一个多框架的多任务学习策略, 以编码内容、 分解和运动流, 从而可靠地推断反射场。 这让我们同时控制面部显示并应用虚拟重现。 广泛的实验表明, 我们的技术可以强有力地处理挑战性表达和照明环境, 并在电影质量上产生结果 。