Faceformer: 语音驱动器 3D 与变形器的加速动画 (FaceFormer: Speech-Driven 3D Facial Animation with Transformers) - 专知论文

会员服务 ·

0

有偏 · 3D · Extensibility · 注意力机制 · 变换 ·

2021 年 12 月 10 日

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

翻译：Faceformer: 语音驱动器 3D 与变形器的加速动画

Yingruo Fan,Zhaojiang Lin,Jun Saito,Wenping Wang,Taku Komura

Speech-driven 3D facial animation is challenging due to the complex geometry of human faces and the limited availability of 3D audio-visual data. Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. To cope with the data scarcity issue, we integrate the self-supervised pre-trained speech representations. Also, we devise two biased attention mechanisms well suited to this specific task, including the biased cross-modal multi-head (MH) attention and the biased causal MH self-attention with a periodic positional encoding strategy. The former effectively aligns the audio-motion modalities, whereas the latter offers abilities to generalize to longer audio sequences. Extensive experiments and a perceptual user study show that our approach outperforms the existing state-of-the-arts. The code will be made available.

翻译：3D 面部动画之所以具有挑战性,是因为人类面部的几何特征复杂,3D 视听数据有限。先前的工作通常侧重于学习背景有限的短音窗口的电话级特征,有时会导致不准确的嘴唇运动。为了应对这一限制,我们提议了一个基于变异器的自动递增模型,FaceFormer,该模型对长期音频环境进行编码,并自动递增地预测了3D 张动片的序列。为了应对数据稀缺问题,我们整合了自我监督的事先培训的语音演示。此外,我们设计了两种偏向关注机制,非常适合这一具体任务,包括偏向的跨模式多头(MH)关注和偏向性因果的 MH 自我关注,并采用定期定位编码战略。前者有效地调整了音频波模式,而后者则提供了将更长期的音频序列加以概括的能力。广泛的实验和感知性用户研究显示,我们的方法超越了现有的状态。代码将予使用。

0

相关内容

ICCV2021 RealVSR: 业界首个移动端真实场景视频超分数据集

专知会员服务

24+阅读 · 2021年9月28日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【UAI2021教程】贝叶斯最优学习，65页ppt

【UAI2021教程】贝叶斯最优学习，65页ppt

专知会员服务

65+阅读 · 2021年8月7日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

CVPR2019| 9篇CVPR论文开源代码（行人检测/物体检测/3D Face等）

CVPR2019| 9篇CVPR论文开源代码（行人检测/物体检测/3D Face等）

极市平台

12+阅读 · 2019年5月31日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

已删除

AI科技评论

4+阅读 · 2018年8月12日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

3D Face Recognition: A Survey

Arxiv

7+阅读 · 2021年8月25日

HuMoR: 3D Human Motion Model for Robust Pose Estimation

Arxiv

3+阅读 · 2021年5月10日

MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization

Arxiv

4+阅读 · 2021年1月17日

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Arxiv

9+阅读 · 2020年12月31日

End-to-end Lane Shape Prediction with Transformers

Arxiv

3+阅读 · 2020年11月28日

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Arxiv

3+阅读 · 2020年4月14日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Factor Graph Attention

Factor Graph Attention

Arxiv

6+阅读 · 2019年4月11日

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Arxiv

4+阅读 · 2019年2月27日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

ICCV2021 RealVSR: 业界首个移动端真实场景视频超分数据集

专知会员服务

24+阅读 · 2021年9月28日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【UAI2021教程】贝叶斯最优学习，65页ppt

【UAI2021教程】贝叶斯最优学习，65页ppt

专知会员服务

65+阅读 · 2021年8月7日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

CVPR2019| 9篇CVPR论文开源代码（行人检测/物体检测/3D Face等）

CVPR2019| 9篇CVPR论文开源代码（行人检测/物体检测/3D Face等）

极市平台

12+阅读 · 2019年5月31日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

已删除

AI科技评论

4+阅读 · 2018年8月12日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

3D Face Recognition: A Survey

Arxiv

7+阅读 · 2021年8月25日

HuMoR: 3D Human Motion Model for Robust Pose Estimation

Arxiv

3+阅读 · 2021年5月10日

MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization

Arxiv

4+阅读 · 2021年1月17日

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Arxiv

9+阅读 · 2020年12月31日

End-to-end Lane Shape Prediction with Transformers

Arxiv

3+阅读 · 2020年11月28日

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Arxiv

3+阅读 · 2020年4月14日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Factor Graph Attention

Factor Graph Attention

Arxiv

6+阅读 · 2019年4月11日

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Arxiv

4+阅读 · 2019年2月27日

微信扫码咨询专知VIP会员