以单一形象和情感状态发声, (Speech Driven Talking Face Generation from a Single Image and an Emotion Condition)

Visual emotion expression plays an important role in audiovisual speech communication. In this work, we propose a novel approach to rendering visual emotion expression in speech-driven talking face generation. Specifically, we design an end-to-end talking face generation system that takes a speech utterance, a single face image, and a categorical emotion label as input to render a talking face video synchronized with the speech and expressing the conditioned emotion. Objective evaluation on image quality, audiovisual synchronization, and visual emotion expression shows that the proposed system outperforms a state-of-the-art baseline system. Subjective evaluation of visual emotion expression and video realness also demonstrates the superiority of the proposed system. Furthermore, we conduct a human emotion recognition pilot study using generated videos with mismatched emotions among the audio and visual modalities. Results show that humans respond to the visual modality more significantly than the audio modality on this task.

翻译：视觉情感表达在视听语言交流中起着重要作用。在这项工作中, 我们提出一种新的方法, 将视觉情感表达在语音驱动的谈话面部一代中。具体地说, 我们设计了一个端对端的谈话面部生成系统, 采用语音表达、单一面部图像和绝对的情感标签作为输入, 使谈话面部视频与演讲同步, 表达有条件的情感。对图像质量、视听同步和视觉情感表达的客观评价显示, 拟议的系统优于最先进的基线系统。对视觉情感表达和视频真实性进行的主观评价也显示了拟议系统的优越性。此外, 我们使用视频和视觉模式之间不匹配的视频, 进行了人类情感识别试点研究。结果显示, 人类对视觉模式的反应比此任务的音频模式更为显著。

相关内容

Speech Com

关注 0

Speech Communication是一门跨学科期刊，其主要目标是满足快速传播和彻底讨论基础研究和应用研究结果的需求。为了建立框架以相互关联本领域各个领域的结果，将重点放在跨学科性质的观点和主题上。官网地址：http://dblp.uni-trier.de/db/journals/speech/

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

专知会员服务

71+阅读 · 2019年12月31日