Impressive progress has been made in audio-driven 3D facial animation recently, but synthesizing 3D talking-head with rich emotion is still unsolved. This is due to the lack of 3D generative models and available 3D emotional dataset with synchronized audios. To address this, we introduce 3D-TalkEmo, a deep neural network that generates 3D talking head animation with various emotions. We also create a large 3D dataset with synchronized audios and videos, rich corpus, as well as various emotion states of different persons with the sophisticated 3D face reconstruction methods. In the emotion generation network, we propose a novel 3D face representation structure - geometry map by classical multi-dimensional scaling analysis. It maps the coordinates of vertices on a 3D face to a canonical image plane, while preserving the vertex-to-vertex geodesic distance metric in a least-square sense. This maintains the adjacency relationship of each vertex and holds the effective convolutional structure for the 3D facial surface. Taking a neutral 3D mesh and a speech signal as inputs, the 3D-TalkEmo is able to generate vivid facial animations. Moreover, it provides access to change the emotion state of the animated speaker. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating the generated talking-heads of significantly higher quality compared to previous state-of-the-art methods.
翻译:最近,在音频驱动的 3D 面部动画中取得了令人印象深刻的进展,但将3D 说话头部与丰富的情感合成在一起的3D 情绪尚未解决。 这是因为缺少 3D 基因模型和可用 3D 情感数据集, 并配有同步的音频。 为此, 我们引入了 3D- TalkEmo, 这是一种深层神经网络, 产生 3D 谈话头部动动动动和各种情绪。 我们还创建了一个大型 3D 数据集, 包含同步的音频和视频, 内容丰富, 以及具有尖端的3D 3D 面部面部重建方法的不同人物的各种情绪状态。 在情感生成的网络中, 我们提出了一个新的 3D 3D 面部 图像图解图示结构。 它绘制了3D 3D 面部脸部的顶部的坐标, 并绘制了我们面部面部面部面部面部面部面部面部面部的定性评估方法。