As 3D facial avatars become more widely used for communication, it is critical that they faithfully convey emotion. Unfortunately, the best recent methods that regress parametric 3D face models from monocular images are unable to capture the full spectrum of facial expression, such as subtle or extreme emotions. We find the standard reconstruction metrics used for training (landmark reprojection error, photometric error, and face recognition loss) are insufficient to capture high-fidelity expressions. The result is facial geometries that do not match the emotional content of the input image. We address this with EMOCA (EMOtion Capture and Animation), by introducing a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image. While EMOCA achieves 3D reconstruction errors that are on par with the current best methods, it significantly outperforms them in terms of the quality of the reconstructed expression and the perceived emotional content. We also directly regress levels of valence and arousal and classify basic expressions from the estimated 3D face parameters. On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior. The model and code are publicly available at https://emoca.is.tue.mpg.de.
翻译:由于3D面部变形器越来越广泛地用于沟通,他们必须忠实地传达情感。 不幸的是,从单视图像中反退3D面模模型的最新最新最佳方法无法捕捉面部表达的全方位,例如微妙或极端情绪。我们发现用于培训的标准重建指标(陆界重射误差、光度误差和面部识别损失)不足以捕捉高不洁的表达方式。结果就是与输入图像的情感内容不匹配的面部地理偏差。我们还与EMOCA(感知抓取和动动动)一起解决这个问题,在培训期间引入了新颖的深刻感知性情感一致性损失,这有助于确保重建的3D表达方式与输入图像中描述的表达方式相匹配。虽然EMOCA取得了与当前最佳方法相近的3D重建标准错误,但从重建表达方式的质量和感知的情感内容方面看,大大超出它们。我们还直接回归了价值和振奋度水平,并将估计的3D面参数的基本表达方式进行了分类。关于重新构建3D表达方式的最佳方法,在可公开分析中突出的人类图像方法。