LipSync3D:利用胶片和灯光正常化录像中个人化的3D说话面部的数据有效学习 (LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization) - 专知论文

会员服务 ·

0

规范化的 · 3D · 学成 · state-of-the-art · 逼真度 ·

2021 年 6 月 8 日

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

翻译：LipSync3D:利用胶片和灯光正常化录像中个人化的3D说话面部的数据有效学习

Avisek Lahiri,Vivek Kwatra,Christian Frueh,John Lewis,Chris Bregler

from arxiv, Accepted to IEEE CVPR 2021. Brief demo video available at: https://www.youtube.com/watch?v=L1StbX9OznY

In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio. We introduce two training-time data normalizations that significantly improve data sample efficiency. First, we isolate and represent faces in a normalized space that decouples 3D geometry, head pose, and texture. This decomposes the prediction problem into regressions over the 3D face shape and the corresponding 2D texture atlas. Second, we leverage facial symmetry and approximate albedo constancy of skin to isolate and remove spatio-temporal lighting variations. Together, these normalizations allow simple networks to generate high fidelity lip-sync videos under novel ambient illumination while training with just a single speaker-specific video. Further, to stabilize temporal dynamics, we introduce an auto-regressive approach that conditions the model on its previous visual state. Human ratings and objective metrics demonstrate that our method outperforms contemporary state-of-the-art audio-driven video reenactment benchmarks in terms of realism, lip-sync and visual quality scores. We illustrate several applications enabled by our framework.

翻译：在本文中,我们展示了一个基于视频的学习框架,以从音频中激发个性化的 3D 说话面孔。我们引入了两个培训时间的数据正常化, 大大提高了数据样本效率。首先, 我们分离并代表了在一个解开 3D 几何、头部姿势和纹理的正常空间中面孔。这将预测问题分解成3D 脸形和相应的 2D 纹理图集的倒退。第二, 我们利用面部对称和皮肤近似反比振动粘合, 孤立并消除时空照明变化。这些正常化使简单的网络能够在新的环境污染下生成高度忠实的双向同步视频, 而仅用单一的单个演讲者专用视频进行培训。此外, 为了稳定时间动态, 我们引入了一种自动反向的方法, 将模型以其先前的视觉状态作为条件。人类的评级和客观衡量尺度表明, 我们的方法在现实主义、唇合成和视觉质量计分数框架中超越了当代状态驱动的视频再反应基准。我们用了一些应用来说明。

0

相关内容

规范化的

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

专知会员服务

71+阅读 · 2020年1月22日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

极市平台

4+阅读 · 2018年4月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【学习】CVPR 2017 Tutorial：如何从图像来构建3D模型

【学习】CVPR 2017 Tutorial：如何从图像来构建3D模型

机器学习研究会

6+阅读 · 2017年8月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable Rendering

SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable Rendering

Arxiv

0+阅读 · 2021年8月3日

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

Arxiv

0+阅读 · 2021年8月2日

A 3D model-based approach for fitting masks to faces in the wild

Arxiv

0+阅读 · 2021年8月2日

Self-supervised Video Representation Learning by Context and Motion Decoupling

Arxiv

6+阅读 · 2021年4月2日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Arxiv

5+阅读 · 2019年3月8日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Video-to-Video Synthesis

Video-to-Video Synthesis

Arxiv

9+阅读 · 2018年8月20日

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Arxiv

3+阅读 · 2018年5月1日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

专知会员服务

71+阅读 · 2020年1月22日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

极市平台

4+阅读 · 2018年4月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【学习】CVPR 2017 Tutorial：如何从图像来构建3D模型

【学习】CVPR 2017 Tutorial：如何从图像来构建3D模型

机器学习研究会

6+阅读 · 2017年8月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable Rendering

SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable Rendering

Arxiv

0+阅读 · 2021年8月3日

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

Arxiv

0+阅读 · 2021年8月2日

A 3D model-based approach for fitting masks to faces in the wild

Arxiv

0+阅读 · 2021年8月2日

Self-supervised Video Representation Learning by Context and Motion Decoupling

Arxiv

6+阅读 · 2021年4月2日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Arxiv

5+阅读 · 2019年3月8日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Video-to-Video Synthesis

Video-to-Video Synthesis

Arxiv

9+阅读 · 2018年8月20日

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Arxiv

3+阅读 · 2018年5月1日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

微信扫码咨询专知VIP会员