VCVTS:通过从语音转换的跨模式知识转让实现多发言者视频到语音合成 (VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion) - 专知论文

会员服务 ·

0

VTS · Extensibility · 词表 · Networking · Performer ·

2022 年 2 月 18 日

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

翻译：VCVTS:通过从语音转换的跨模式知识转让实现多发言者视频到语音合成

Disong Wang,Shan Yang,Dan Su,Xunying Liu,Dong Yu,Helen Meng

from arxiv, Accepted to ICASSP 2022. Demo page is available at https://wendison.github.io/VCVTS-demo/

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system. This paper proposes a novel multi-speaker VTS system based on cross-modal knowledge transfer from voice conversion (VC), where vector quantization with contrastive predictive coding (VQCPC) is used for the content encoder of VC to derive discrete phoneme-like acoustic units, which are transferred to a Lip-to-Index (Lip2Ind) network to infer the index sequence of acoustic units. The Lip2Ind network can then substitute the content encoder of VC to form a multi-speaker VTS system to convert silent video to acoustic units for reconstructing accurate spoken content. The VTS system also inherits the advantages of VC by using a speaker encoder to produce speaker representations to effectively control the speaker identity of generated speech. Extensive evaluations verify the effectiveness of proposed approach, which can be applied in both constrained vocabulary and open vocabulary conditions, achieving state-of-the-art performance in generating high-quality speech with high naturalness, intelligibility and speaker similarity. Our demo page is released here: https://wendison.github.io/VCVTS-demo/

翻译：尽管对依赖语音的视频到语音合成(VTS)取得了显著进展,但对能够将静音视频映射成语音的多发式VTS的多发式VTS却很少注意,这些视频可以将静音视频映射成语音,同时允许在一个单一系统中灵活控制语音身份。本文提议建立一个新型的多发式VTS系统,其基础是声音转换(VC)的跨模式知识传输,其中矢量量化与对比性预测编码(VQCPC)用于VC的内容编码器,以生成离散的语音类似的声音设备,这些设备被转移到Lipto-Index(Lip2Ind)网络,以推断音响器的索引序列。Lip2Ind网络随后可以取代VC的内容编码器,形成一个多发式VTS系统,将静音视频转换为音设备,以重建准确的语音内容。VTS系统还继承了VC的优势,其方法是使用发言人解说器制作语音演示,以有效控制所生成的演讲者身份。广泛的评价可以核实拟议方法的有效性,在高品质和高品质中应用。

0

相关内容

VTS

VTS：VLSI Test Symposium Explanation：超大规模集成电路测试研讨会。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/vts/

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

36+阅读 · 2020年6月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于ancilla量子位的多通道量子视频生成及加密方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于压缩感知的机械振动信号检测理论及试验研究

国家自然科学基金

1+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

对称性破缺的金纳米球核/球壳结构表面等离子共振特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

等离子体诱导环糊精修饰石墨烯/铁氧化物对放射性核素吸附及其机理研究

国家自然科学基金

2+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

ERR-alpha 小分子激动剂及其对糖脂代谢调控的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

选择性视觉认知障碍动态变化的解剖神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

Simple and Effective Unsupervised Speech Synthesis

Simple and Effective Unsupervised Speech Synthesis

Arxiv

2+阅读 · 2022年4月20日

BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis

Arxiv

0+阅读 · 2022年4月19日

Time Domain Adversarial Voice Conversion for ADD 2022

Arxiv

0+阅读 · 2022年4月19日

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Arxiv

1+阅读 · 2022年4月15日

Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling

Arxiv

13+阅读 · 2021年12月3日

Deformable Style Transfer

Deformable Style Transfer

Arxiv

14+阅读 · 2020年3月24日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

36+阅读 · 2020年6月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】重新思考面向风险感知的社会型具身智能的安全保障体系

【ICML2025】FG-CLIP：细粒度视觉与文本对齐

【CVPR2025】并非所有参数都重要：通过参数掩码提升扩散模型的生成能力

作战仿真想定智能化生成研究综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Simple and Effective Unsupervised Speech Synthesis

Simple and Effective Unsupervised Speech Synthesis

Arxiv

2+阅读 · 2022年4月20日

BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis

Arxiv

0+阅读 · 2022年4月19日

Time Domain Adversarial Voice Conversion for ADD 2022

Arxiv

0+阅读 · 2022年4月19日

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Arxiv

1+阅读 · 2022年4月15日

Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling

Arxiv

13+阅读 · 2021年12月3日

Deformable Style Transfer

Deformable Style Transfer

Arxiv

14+阅读 · 2020年3月24日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于ancilla量子位的多通道量子视频生成及加密方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于压缩感知的机械振动信号检测理论及试验研究

国家自然科学基金

1+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

对称性破缺的金纳米球核/球壳结构表面等离子共振特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

等离子体诱导环糊精修饰石墨烯/铁氧化物对放射性核素吸附及其机理研究

国家自然科学基金

2+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

ERR-alpha 小分子激动剂及其对糖脂代谢调控的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

选择性视觉认知障碍动态变化的解剖神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员