LipSound2: 自我监督的对口话语重建和阅读口语前培训 (LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading) - 专知论文

会员服务 ·

0

Attention · Performer · AIM · state-of-the-art · MoDELS ·

2022 年 9 月 12 日

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading

翻译：LipSound2: 自我监督的对口话语重建和阅读口语前培训

Leyuan Qu,Cornelius Weber,Stefan Wermter

from arxiv, ACCEPTED IN IEEE Transactions on Neural Networks and Learning Systems

The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 which consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human annotations. The proposed LipSound2 model is firstly pre-trained on $\sim$2400h multi-lingual (e.g. English and German) audio-visual data (VoxCeleb2). To verify the generalizability of the proposed method, we then fine-tune the pre-trained model on domain-specific datasets (GRID, TCD-TIMIT) for English speech reconstruction and achieve a significant improvement on speech quality and intelligibility compared to previous approaches in speaker-dependent and -independent settings. In addition to English, we conduct Chinese speech reconstruction on the CMLR dataset to verify the impact on transferability. Lastly, we train the cascaded lip reading (video-to-text) system by fine-tuning the generated audios on a pre-trained speech recognition system and achieve state-of-the-art performance on both English and Chinese benchmark datasets.

翻译：这项工作的目的是,通过利用视频中视听流的自然共生作用,调查语言重建(视频到视听)的跨模式自我监督预培训(视频到视听)的影响。我们提议LipSound2, 其中包括一个编码器解码器架构和位置感知关注机制,将图像序列直接映射成市级光谱,而不需要任何人文说明。拟议的LipSound2模型首先对2400美元多语言(如英语和德语)视听数据(VoxCeleb.2)进行了预先培训,以核实拟议方法的可普遍适用性。我们然后对特定域数据集(GRID、TCD-TIMIT)预先培训的模式进行微调,用于英语语音重建,与以前在依赖语言和不依赖语言的环境中采用的方法相比,在语言质量和智能方面实现显著改进。除了英语外,我们还对CMLR数据集进行中文语音重建,以核实对可传输性的影响。最后,我们通过对英语升级的语音识别系统进行升级,对英语前的语音识别系统进行升级。

0

相关内容

Attention

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

FTO抑制自噬导致非酒精性脂肪肝脂质沉积的关键分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin通路介导RELMβ调控糖尿病肾病系膜细胞增殖的机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

炎症调节因子c-REL抑制TAp73的抗骨肉瘤作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

益气活血法调控动脉粥样硬化大鼠Rho激酶信号通路的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Caveolae和Rho激酶信号传导通路在PPARs调节血管内皮细胞中缝隙连接蛋白的作用

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤相关巨噬细胞来源的CCL18促进乳腺癌EMT的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Joint Speech Translation and Named Entity Recognition

Arxiv

0+阅读 · 2022年10月21日

Self-Supervised Robustifying Guidance for Monocular 3D Face Reconstruction

Arxiv

0+阅读 · 2022年10月21日

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Arxiv

0+阅读 · 2022年10月21日

A scan-specific unsupervised method for parallel MRI reconstruction via implicit neural representation

Arxiv

0+阅读 · 2022年10月19日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Joint Speech Translation and Named Entity Recognition

Arxiv

0+阅读 · 2022年10月21日

Self-Supervised Robustifying Guidance for Monocular 3D Face Reconstruction

Arxiv

0+阅读 · 2022年10月21日

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Arxiv

0+阅读 · 2022年10月21日

A scan-specific unsupervised method for parallel MRI reconstruction via implicit neural representation

Arxiv

0+阅读 · 2022年10月19日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

相关基金

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

FTO抑制自噬导致非酒精性脂肪肝脂质沉积的关键分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin通路介导RELMβ调控糖尿病肾病系膜细胞增殖的机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

炎症调节因子c-REL抑制TAp73的抗骨肉瘤作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

益气活血法调控动脉粥样硬化大鼠Rho激酶信号通路的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Caveolae和Rho激酶信号传导通路在PPARs调节血管内皮细胞中缝隙连接蛋白的作用

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤相关巨噬细胞来源的CCL18促进乳腺癌EMT的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员