This paper presents a method for riggable 3D face reconstruction from monocular images, which jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations. To achieve this goal, we design an end-to-end trainable network embedded with a differentiable in-network optimization. The network first parameterizes the face rig as a compact latent code with a neural decoder, and then estimates the latent code as well as per-image parameters via a learnable optimization. By estimating a personalized face rig, our method goes beyond static reconstructions and enables downstream applications such as video retargeting. In-network optimization explicitly enforces constraints derived from the first principles, thus introduces additional priors than regression-based methods. Finally, data-driven priors from deep learning are utilized to constrain the ill-posed monocular setting and ease the optimization difficulty. Experiments demonstrate that our method achieves SOTA reconstruction accuracy, reasonable robustness and generalization ability, and supports standard face rig applications.
翻译:本文介绍了一种用单镜图像对3D面部进行修整的方法,该方法共同估计个人化面部钻机和每个图像参数,包括表达式、面部和光化。为了实现这一目标,我们设计了一个终端到终端可训练网络,嵌入一个不同网络优化的网络。网络首先将面部钻机参数定位为具有神经解码器的紧凑潜伏代码,然后通过一个可学习优化来估计潜在代码和每个图像参数。通过估算个人化面部钻机,我们的方法超越了静态重建,使视频重新定位等下游应用得以进行。在网络优化中,明确执行源自第一条原则的限制,从而引入了比基于回归的方法更多的前科。最后,利用深层学习的数据驱动前科来限制错误的单形设置并减轻优化难度。实验表明,我们的方法实现了SOTA重建精度、合理坚固度和一般化能力,并支持标准面部钻机应用。