Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.
翻译:面部表情生成是角色动画中最具挑战性、一直以来被追求的方面之一,具有许多有趣的应用。这项具有挑战性的任务传统上一直依赖于数字工匠,但它仍需探索。本文介绍了一种生成框架,用于生成可由不同输入条件化的 3D 面部表情序列(即 4D 面部)以动画任意 3D 面部网格。它由两个任务组成:(1)学习经过训练的生成模型,该模型基于一组 3D 标记序列进行训练,(2)生成由生成的标记序列驱动的输入面部网格的 3D 网格序列。生成模型基于去噪扩散概率模型(DDPM),在其他领域的生成任务中取得了显著的成功。虽然它可以无条件地进行训练,但是它的反转过程仍然可以通过各种条件信号进行条件化。这使我们可以通过使用表情标签、文本、部分序列或仅仅是面部几何体来高效地开发涉及各种条件生成的下游任务。为了获得完整的网格变形,我们开发了一个标记引导的编码器-解码器,将嵌入标记中的几何变形应用于给定的面部网格。实验表明,我们的模型已经学会了仅使用相对较小的数据集生成逼真、高质量的表情,改进了现有的最先进方法。视频和与其他方法的质量比较可在 https://github.com/ZOUKaifeng/4DFM 找到。代码和模型将在接受后提供。