LA-VocE: 使用神经导体增强低RSNR视听语音 (LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders) - 专知论文

会员服务 ·

0

语音增强 · Continuity · 回合 · Backbone · 语音合成 ·

2022 年 11 月 20 日

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

翻译：LA-VocE: 使用神经导体增强低RSNR视听语音

Rodrigo Mira,Buye Xu,Jacob Donley,Anurag Kumar,Stavros Petridis,Vamsi Krishna Ithapu,Maja Pantic

from arxiv, Submitted to ICASSP 2023

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only speech enhancement, particularly for the removal of interfering speech. Despite recent advances in speech synthesis, most audio-visual approaches continue to use spectral mapping/masking to reproduce the clean audio, often resulting in visual backbones added to existing speech enhancement architectures. In this work, we propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture, and then converts them into waveform audio using a neural vocoder (HiFi-GAN). We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model's ability to adapt to different levels of background noise and speech interference. Our experiments show that LA-VocE outperforms existing methods according to multiple metrics, particularly under very noisy scenarios.

翻译：提高视听语言能力的目的是通过不仅利用音频本身,而且利用目标演讲者的嘴唇运动,从一个吵闹的环境里获取清洁的言语。这一方法表明,在只增加音频的言语能力方面,特别是在消除干扰性言语方面产生了改进。尽管最近在语音合成方面有所进展,但大多数视听方法继续使用光谱制图/制成像来复制清洁的音频,这往往导致将视觉骨干添加到现有的语音增强结构中。在这项工作中,我们提出LA-VocE,这是一个新的两阶段方法,通过变压器建筑预测音频视听演讲的美分谱,然后用电动电动电动电动电动电动声学(HIFi-GAN)将其转换为波形音频。我们用数千个发言者和11+种不同语言来培训和评价我们的框架,并研究我们的模型适应不同背景噪音和语音干扰水平的能力。我们的实验表明,LA-VocE根据多种计量方法,特别是在非常吵闹的情景下,超越现有方法。

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ATP13A2基因亚型Ala746Thr和Thr12met突变与新疆维吾尔族早发型和家族型帕金森病临床的相关研究

国家自然科学基金

0+阅读 · 2014年12月31日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

乙型肝炎病毒剪接蛋白与TGFβ1I1相互作用在肝硬化、肝癌过程中的致病机制

国家自然科学基金

0+阅读 · 2012年12月31日

Clusterin通过线粒体凋亡通路调节肝细胞肝癌化疗耐受机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

C0-029诱导上皮间变促进肝癌侵袭转移的研究

国家自然科学基金

0+阅读 · 2011年12月31日

瞬态冲击信号的Hilbert时频谱表征方法与应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

Narrowband Interference Detection via Deep Learning

Arxiv

0+阅读 · 2023年1月23日

Ensemble Transfer Learning for Multilingual Coreference Resolution

Arxiv

0+阅读 · 2023年1月22日

LaF: Labeling-Free Model Selection for Automated Deep Neural Network Reusing

Arxiv

0+阅读 · 2023年1月20日

Deep Reinforcement Learning for Power Trading

Arxiv

1+阅读 · 2023年1月19日

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年1月19日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

OntoZSL: Ontology-enhanced Zero-shot Learning

Arxiv

17+阅读 · 2021年2月15日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Deep Reinforcement Learning: An Overview

Arxiv

15+阅读 · 2018年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Narrowband Interference Detection via Deep Learning

Arxiv

0+阅读 · 2023年1月23日

Ensemble Transfer Learning for Multilingual Coreference Resolution

Arxiv

0+阅读 · 2023年1月22日

LaF: Labeling-Free Model Selection for Automated Deep Neural Network Reusing

Arxiv

0+阅读 · 2023年1月20日

Deep Reinforcement Learning for Power Trading

Arxiv

1+阅读 · 2023年1月19日

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年1月19日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

OntoZSL: Ontology-enhanced Zero-shot Learning

Arxiv

17+阅读 · 2021年2月15日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Deep Reinforcement Learning: An Overview

Arxiv

15+阅读 · 2018年6月23日

相关基金

ATP13A2基因亚型Ala746Thr和Thr12met突变与新疆维吾尔族早发型和家族型帕金森病临床的相关研究

国家自然科学基金

0+阅读 · 2014年12月31日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

乙型肝炎病毒剪接蛋白与TGFβ1I1相互作用在肝硬化、肝癌过程中的致病机制

国家自然科学基金

0+阅读 · 2012年12月31日

Clusterin通过线粒体凋亡通路调节肝细胞肝癌化疗耐受机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

C0-029诱导上皮间变促进肝癌侵袭转移的研究

国家自然科学基金

0+阅读 · 2011年12月31日

瞬态冲击信号的Hilbert时频谱表征方法与应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员