While deep learning-based 3D face generation has made a progress recently, the problem of dynamic 3D (4D) facial expression synthesis is less investigated. In this paper, we propose a novel solution to the following question: given one input 3D neutral face, can we generate dynamic 3D (4D) facial expressions from it? To tackle this problem, we first propose a mesh encoder-decoder architecture (Expr-ED) that exploits a set of 3D landmarks to generate an expressive 3D face from its neutral counterpart. Then, we extend it to 4D by modeling the temporal dynamics of facial expressions using a manifold-valued GAN capable of generating a sequence of 3D landmarks from an expression label (Motion3DGAN). The generated landmarks are fed into the mesh encoder-decoder, ultimately producing a sequence of 3D expressive faces. By decoupling the two steps, we separately address the non-linearity induced by the mesh deformation and motion dynamics. The experimental results on the CoMA dataset show that our mesh encoder-decoder guided by landmarks brings a significant improvement with respect to other landmark-based 3D fitting approaches, and that we can generate high quality dynamic facial expressions. This framework further enables the 3D expression intensity to be continuously adapted from low to high intensity. Finally, we show our framework can be applied to other tasks, such as 2D-3D facial expression transfer.
翻译:虽然基于深层次学习的 3D 面部合成最近有所进展, 动态 3D (4D) 面部表达式合成问题却没有得到多少调查。 在本文中, 我们提出一个新颖的解决方案, 解决以下问题: 如果有一个输入 3D 中性面孔, 我们能从中产生动态 3D (4D) 面部表达式表达式表达式表达式吗? 为了解决这个问题, 我们首先提出一个 网目编码解码解码器结构( Extrar- ED), 利用一组3D 标志来产生一个表达式的3D 面部表达式表达式表情。 然后, 我们将其扩展为4D 。 我们用一个多值GAN来模拟面部表情的时空动态动态动态表达式表达式动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态变化模型( Motiond D 3D lagistrual ladeal lax lax) 能够从一个显著的高度改进到另一个高密度框架。