LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading - 专知论文

会员服务 ·

0

HTTPS · 模态 · 数据集 · 词表 · Projection ·

2023 年 6 月 5 日

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

翻译：暂无翻译

Yochai Yemini,Aviv Shamsian,Lior Bracha,Sharon Gannot,Ethan Fetaya

Lip-to-speech involves generating a natural-sounding speech synchronized with a soundless video of a person talking. Despite recent advances, current methods still cannot produce high-quality speech with high levels of intelligibility for challenging and realistic datasets such as LRS3. In this work, we present LipVoicer, a novel method that generates high-quality speech, even for in-the-wild and rich datasets, by incorporating the text modality. Given a silent video, we first predict the spoken text using a pre-trained lip-reading network. We then condition a diffusion model on the video and use the extracted text through a classifier-guidance mechanism where a pre-trained ASR serves as the classifier. LipVoicer outperforms multiple lip-to-speech baselines on LRS2 and LRS3, which are in-the-wild datasets with hundreds of unique speakers in their test set and an unrestricted vocabulary. Moreover, our experiments show that the inclusion of the text modality plays a major role in the intelligibility of the produced speech, readily perceptible while listening, and is empirically reflected in the substantial reduction of the WER metric. We demonstrate the effectiveness of LipVoicer through human evaluation, which shows that it produces more natural and synchronized speech signals compared to competing methods. Finally, we created a demo showcasing LipVoicer's superiority in producing natural, synchronized, and intelligible speech, providing additional evidence of its effectiveness. Project page: https://lipvoicer.github.io

翻译：暂无翻译

0

相关内容

HTTPS

超文本传输安全协议是超文本传输协议和 SSL/TLS 的组合，用以提供加密通讯及对网络服务器身份的鉴定。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

拟南芥FT调控开花的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

副溶血弧菌VI型分泌系统的表型功能及基因调控研究

国家自然科学基金

1+阅读 · 2014年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

当归六黄汤免疫调控胰岛素抵抗的作用机制及配伍研究

国家自然科学基金

0+阅读 · 2014年12月31日

免疫性肝损伤过程中细胞色素P450酶系下调的转录水平调节和翻译后蛋白修饰研究

国家自然科学基金

0+阅读 · 2014年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

人泡沫病毒LTR与细胞锌指蛋白ZNF268相互作用机制及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

猪β-防御素特异性调控免疫靶细胞的筛选及调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼miR-142a-3p影响及调控造血干细胞发育机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

调控家蚕发育非编码RNA（non-coding RNA, ncRNA）的功能解析

国家自然科学基金

0+阅读 · 2011年12月31日

ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

Arxiv

0+阅读 · 2023年7月28日

A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition

Arxiv

0+阅读 · 2023年7月28日

RCT Rejection Sampling for Causal Estimation Evaluation

Arxiv

0+阅读 · 2023年7月27日

Automatic Emotion Experiencer Recognition

Arxiv

1+阅读 · 2023年7月27日

Using Gameplay Videos for Detecting Issues in Video Games

Arxiv

0+阅读 · 2023年7月27日

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking

Arxiv

0+阅读 · 2023年7月26日

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Arxiv

0+阅读 · 2023年7月26日

FedNoisy: Federated Noisy Label Learning Benchmark

Arxiv

0+阅读 · 2023年7月25日

Direct Speech Translation for Automatic Subtitling

Arxiv

0+阅读 · 2023年7月25日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

Arxiv

0+阅读 · 2023年7月28日

A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition

Arxiv

0+阅读 · 2023年7月28日

RCT Rejection Sampling for Causal Estimation Evaluation

Arxiv

0+阅读 · 2023年7月27日

Automatic Emotion Experiencer Recognition

Arxiv

1+阅读 · 2023年7月27日

Using Gameplay Videos for Detecting Issues in Video Games

Arxiv

0+阅读 · 2023年7月27日

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking

Arxiv

0+阅读 · 2023年7月26日

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Arxiv

0+阅读 · 2023年7月26日

FedNoisy: Federated Noisy Label Learning Benchmark

Arxiv

0+阅读 · 2023年7月25日

Direct Speech Translation for Automatic Subtitling

Arxiv

0+阅读 · 2023年7月25日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

相关基金

拟南芥FT调控开花的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

副溶血弧菌VI型分泌系统的表型功能及基因调控研究

国家自然科学基金

1+阅读 · 2014年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

当归六黄汤免疫调控胰岛素抵抗的作用机制及配伍研究

国家自然科学基金

0+阅读 · 2014年12月31日

免疫性肝损伤过程中细胞色素P450酶系下调的转录水平调节和翻译后蛋白修饰研究

国家自然科学基金

0+阅读 · 2014年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

人泡沫病毒LTR与细胞锌指蛋白ZNF268相互作用机制及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

猪β-防御素特异性调控免疫靶细胞的筛选及调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼miR-142a-3p影响及调控造血干细胞发育机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

调控家蚕发育非编码RNA（non-coding RNA, ncRNA）的功能解析

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员