与音频和语音相关辅助行动股的有声人代代主管 (Talking Head Generation with Audio and Speech Related Facial Action Units) - 专知论文

会员服务 ·

0

INFORMS · 生成器网络 · MoDELS · 相关系数 · 模型评估 ·

2021 年 10 月 19 日

Talking Head Generation with Audio and Speech Related Facial Action Units

翻译：与音频和语音相关辅助行动股的有声人代代主管

Sen Chen,Zhilei Liu,Jiaxing Liu,Zhengxiang Yan,Longbiao Wang

from arxiv, Accepted by BMVC 2021

The task of talking head generation is to synthesize a lip synchronized talking head video by inputting an arbitrary face image and audio clips. Most existing methods ignore the local driving information of the mouth muscles. In this paper, we propose a novel recurrent generative network that uses both audio and speech-related facial action units (AUs) as the driving information. AU information related to the mouth can guide the movement of the mouth more accurately. Since speech is highly correlated with speech-related AUs, we propose an Audio-to-AU module in our system to predict the speech-related AU information from speech. In addition, we use AU classifier to ensure that the generated images contain correct AU information. Frame discriminator is also constructed for adversarial training to improve the realism of the generated face. We verify the effectiveness of our model on the GRID dataset and TCD-TIMIT dataset. We also conduct an ablation study to verify the contribution of each component in our model. Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.

翻译：通过输入任意的脸部图像和音频剪辑来合成一个嘴唇同步说话的头部视频。大多数现有方法忽视了当地口腔肌肉的驱动信息。在本文中,我们提议建立一个新的反复出现的基因化网络,同时使用音频和言语相关面部动作单位作为驱动信息。非盟与口有关的信息可以更准确地指导口腔的移动。由于言论与言语相关,我们提议在我们的系统中建立一个音频至非盟模块,以预测通过发言获得的与非盟有关的信息。此外,我们利用非盟分类器确保生成的图像包含正确的非盟信息。框架歧视器也是为对抗性培训而建立的,以改善所生成面部的真实性。我们核查了我们关于全球资源数据库和TCD-TIM数据集模型的模型的有效性。我们还进行了一个模拟研究,以核实我们模型中每个组成部分的贡献。定量和定性实验表明,我们的方法在图像质量和唇合准确性两方面都超过了现有方法。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

37+阅读 · 2020年6月7日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

超全的人脸识别数据集汇总，附打包下载

超全的人脸识别数据集汇总，附打包下载

极市平台

90+阅读 · 2020年3月7日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

AI科技评论

6+阅读 · 2018年1月7日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Detecting Audio Adversarial Examples with Logit Noising

Arxiv

0+阅读 · 2021年12月13日

NeRV: Neural Representations for Videos

Arxiv

9+阅读 · 2021年10月26日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Progressive Pose Attention Transfer for Person Image Generation

Progressive Pose Attention Transfer for Person Image Generation

Arxiv

5+阅读 · 2019年4月9日

Disentangled Person Image Generation

Arxiv

7+阅读 · 2018年1月21日

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

Arxiv

3+阅读 · 2017年8月2日

VIP会员

文章信息

相关主题

生成器网络

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

37+阅读 · 2020年6月7日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

超全的人脸识别数据集汇总，附打包下载

超全的人脸识别数据集汇总，附打包下载

极市平台

90+阅读 · 2020年3月7日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

AI科技评论

6+阅读 · 2018年1月7日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Detecting Audio Adversarial Examples with Logit Noising

Arxiv

0+阅读 · 2021年12月13日

NeRV: Neural Representations for Videos

Arxiv

9+阅读 · 2021年10月26日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Progressive Pose Attention Transfer for Person Image Generation

Progressive Pose Attention Transfer for Person Image Generation

Arxiv

5+阅读 · 2019年4月9日

Disentangled Person Image Generation

Arxiv

7+阅读 · 2018年1月21日

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

Arxiv

3+阅读 · 2017年8月2日

微信扫码咨询专知VIP会员