音乐脸：音乐驱动的富有表现力的歌唱脸部合成 (MusicFace: Music-driven Expressive Singing Face Synthesis) - 专知论文

会员服务 ·

0

音乐 · 表现力 · 分解 · 合成 · 运动生成 ·

2023 年 3 月 24 日

MusicFace: Music-driven Expressive Singing Face Synthesis

翻译：音乐脸：音乐驱动的富有表现力的歌唱脸部合成

Pengfei Liu,Wenjin Deng,Hengda Li,Jintai Wang,Yinglin Zheng,Yiwei Ding,Xiaohu Guo,Ming Zeng

from arxiv, Accepted to CVMJ

It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into human voice stream and background music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressiveness of the generated results, we propose to decompose head movements generation into speed generation and direction generation, and decompose eye states generation into the short-time eye blinking generation and the long-time eye closing generation to model them separately. We also build a novel SingingFace Dataset to support the training and evaluation of this task, and to facilitate future works on this topic. Extensive experiments and user study show that our proposed method is capable of synthesizing vivid singing face, which is better than state-of-the-art methods qualitatively and quantitatively.

翻译：合成一个由音乐信号驱动的逼真歌唱脸部仍是一个有趣而具有挑战性的问题。本文提出了一种方法来完成这个任务，通过自然的唇、面部表情、头部姿态和眼睛状态运动实现。鉴于音乐音频中卷有人类声音和背景音乐混合的信息，我们设计了一个分解和融合策略来解决这个难题。首先，我们将输入的音乐音频分解为人声流和背景音乐流。由于两个流输入信号与面部表情、头部运动和眼睛状态的动态之间具有隐含和复杂的相关性，我们使用注意力机制来建模它们之间的关系，使其融合自然而无缝。此外，为了提高生成结果的表现力，我们将头部运动生成分解为速度生成和方向生成，并将眼睛状态生成分解为短时眨眼生成和长时闭眼生成，分别进行建模。我们还建立了一个新颖的 SingingFace 数据集来支持训练和评估这个任务，以及促进未来在这个主题上的研究工作的发展。广泛的实验和用户研究表明，我们提出的方法能够合成逼真的歌唱脸部，相比于现有的技术，在质量和数量方面都更好。

0

相关内容

音乐，广义而言，指精心组织声音，并将其排布在时间和空间上的艺术类型。

港科大陈启峰博士：AIGC的现状与展望

港科大陈启峰博士：AIGC的现状与展望

专知会员服务

77+阅读 · 2023年1月17日

不可错过！ Virginia Tech《人工智能分子建模》课程，讲述深度分子表示合成等

不可错过！ Virginia Tech《人工智能分子建模》课程，讲述深度分子表示合成等

专知会员服务

16+阅读 · 2022年4月18日

【MM 2021】基于Transformer的动态人脸表情识别网络,Former-DFER: Dynamic Facial Expression Recognition Transformer

【MM 2021】基于Transformer的动态人脸表情识别网络,Former-DFER: Dynamic Facial Expression Recognition Transformer

专知会员服务

21+阅读 · 2022年3月22日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR2021】GAN人脸预训练模型

【CVPR2021】GAN人脸预训练模型

专知会员服务

24+阅读 · 2021年4月10日

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

专知会员服务

37+阅读 · 2020年6月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

【ICIP 2019 Tutorials】图像到图像的转换（Image-to-Image Translation）,英伟达研究员Ming-Yu Liu

【ICIP 2019 Tutorials】图像到图像的转换（Image-to-Image Translation）,英伟达研究员Ming-Yu Liu

专知会员服务

27+阅读 · 2019年8月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

7 Papers & Radios | SIGGRAPH 2022最佳博士论文；DeepMind AI西洋陆军棋中对人胜率84%

7 Papers & Radios | SIGGRAPH 2022最佳博士论文；DeepMind AI西洋陆军棋中对人胜率84%

机器之心

3+阅读 · 2022年7月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人脸专集3 | 人脸关键点检测（下）—文末源码

人脸专集3 | 人脸关键点检测（下）—文末源码

计算机视觉战队

19+阅读 · 2019年4月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

基于多标签流形学习的中国古典音乐情感分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于特权信息和面部先验知识的表情类别和动作单元识别研究

国家自然科学基金

0+阅读 · 2014年12月31日

mGluR5通过PKC/ERK/c-Rel信号通路参与帕金森病异动症发生及治疗机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

跟踪器融合的视觉跟踪方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

真实自发情感的听视觉多模态实时心理学连续维度分析

国家自然科学基金

0+阅读 · 2012年12月31日

以聚电解质-表面活性剂复合介晶为模板的多级孔材料合成及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

光驱动TiO2催化醇脱氢C-C偶联反应的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于芳香亚磺酸的脱二氧化硫C-C键和C-卤键生成反应研究

国家自然科学基金

0+阅读 · 2011年12月31日

含不饱和键（M＝O/NR）的IIIB、IVB族金属有机化合物的合成及反应性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于文库筛选的miR-483-3p介导c-Myc转录通路调控胃癌发生的新机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

DualGenerator: Information Interaction-based Generative Network for Point Cloud Completion

Arxiv

0+阅读 · 2023年5月16日

Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models

Arxiv

0+阅读 · 2023年5月15日

Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text

Arxiv

0+阅读 · 2023年5月15日

Integrating Diverse Knowledge Sources for Online One-shot Learning of Novel Tasks

Arxiv

0+阅读 · 2023年5月15日

Lie Group Forced Variational Integrator Networks for Learning and Control of Robot Systems

Arxiv

0+阅读 · 2023年5月15日

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

Arxiv

0+阅读 · 2023年5月13日

Unsupervised Melody-Guided Lyrics Generation

Arxiv

0+阅读 · 2023年5月12日

BlendFields: Few-Shot Example-Driven Facial Modeling

Arxiv

0+阅读 · 2023年5月12日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

VIP会员

文章信息

相关主题

相关VIP内容

港科大陈启峰博士：AIGC的现状与展望

港科大陈启峰博士：AIGC的现状与展望

专知会员服务

77+阅读 · 2023年1月17日

不可错过！ Virginia Tech《人工智能分子建模》课程，讲述深度分子表示合成等

不可错过！ Virginia Tech《人工智能分子建模》课程，讲述深度分子表示合成等

专知会员服务

16+阅读 · 2022年4月18日

【MM 2021】基于Transformer的动态人脸表情识别网络,Former-DFER: Dynamic Facial Expression Recognition Transformer

【MM 2021】基于Transformer的动态人脸表情识别网络,Former-DFER: Dynamic Facial Expression Recognition Transformer

专知会员服务

21+阅读 · 2022年3月22日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR2021】GAN人脸预训练模型

【CVPR2021】GAN人脸预训练模型

专知会员服务

24+阅读 · 2021年4月10日

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

专知会员服务

37+阅读 · 2020年6月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

【ICIP 2019 Tutorials】图像到图像的转换（Image-to-Image Translation）,英伟达研究员Ming-Yu Liu

【ICIP 2019 Tutorials】图像到图像的转换（Image-to-Image Translation）,英伟达研究员Ming-Yu Liu

专知会员服务

27+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

7 Papers & Radios | SIGGRAPH 2022最佳博士论文；DeepMind AI西洋陆军棋中对人胜率84%

7 Papers & Radios | SIGGRAPH 2022最佳博士论文；DeepMind AI西洋陆军棋中对人胜率84%

机器之心

3+阅读 · 2022年7月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人脸专集3 | 人脸关键点检测（下）—文末源码

人脸专集3 | 人脸关键点检测（下）—文末源码

计算机视觉战队

19+阅读 · 2019年4月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

DualGenerator: Information Interaction-based Generative Network for Point Cloud Completion

Arxiv

0+阅读 · 2023年5月16日

Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models

Arxiv

0+阅读 · 2023年5月15日

Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text

Arxiv

0+阅读 · 2023年5月15日

Integrating Diverse Knowledge Sources for Online One-shot Learning of Novel Tasks

Arxiv

0+阅读 · 2023年5月15日

Lie Group Forced Variational Integrator Networks for Learning and Control of Robot Systems

Arxiv

0+阅读 · 2023年5月15日

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

Arxiv

0+阅读 · 2023年5月13日

Unsupervised Melody-Guided Lyrics Generation

Arxiv

0+阅读 · 2023年5月12日

BlendFields: Few-Shot Example-Driven Facial Modeling

Arxiv

0+阅读 · 2023年5月12日

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Arxiv

34+阅读 · 2023年3月7日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

相关基金

基于多标签流形学习的中国古典音乐情感分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于特权信息和面部先验知识的表情类别和动作单元识别研究

国家自然科学基金

0+阅读 · 2014年12月31日

mGluR5通过PKC/ERK/c-Rel信号通路参与帕金森病异动症发生及治疗机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

跟踪器融合的视觉跟踪方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

真实自发情感的听视觉多模态实时心理学连续维度分析

国家自然科学基金

0+阅读 · 2012年12月31日

以聚电解质-表面活性剂复合介晶为模板的多级孔材料合成及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

光驱动TiO2催化醇脱氢C-C偶联反应的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于芳香亚磺酸的脱二氧化硫C-C键和C-卤键生成反应研究

国家自然科学基金

0+阅读 · 2011年12月31日

含不饱和键（M＝O/NR）的IIIB、IVB族金属有机化合物的合成及反应性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于文库筛选的miR-483-3p介导c-Myc转录通路调控胃癌发生的新机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员