通过学习与多头人关注的视听亲密关系改善视觉语音增强网络 (Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention) - 专知论文

会员服务 ·

0

Attention · 语音增强 · 层 · Learning · Networking ·

2022 年 6 月 30 日

Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention

翻译：通过学习与多头人关注的视听亲密关系改善视觉语音增强网络

Xinmeng Xu,Yang Wang,Jie Jia,Binbin Chen,Dejun Li

from arxiv, Accepted by Interspeech 2022. arXiv admin note: substantial text overlap with arXiv:2101.06268

Audio-visual speech enhancement system is regarded as one of promising solutions for isolating and enhancing speech of desired speaker. Typical methods focus on predicting clean speech spectrum via a naive convolution neural network based encoder-decoder architecture, and these methods a) are not adequate to use data fully, b) are unable to effectively balance audio-visual features. The proposed model alleviates these drawbacks by a) applying a model that fuses audio and visual features layer by layer in encoding phase, and that feeds fused audio-visual features to each corresponding decoder layer, and more importantly, b) introducing a 2-stage multi-head cross attention (MHCA) mechanism to infer audio-visual speech enhancement for balancing the fused audio-visual features and eliminating irrelevant features. This paper proposes attentional audio-visual multi-layer feature fusion model, in which MHCA units are applied to feature mapping at every layer of decoder. The proposed model demonstrates the superior performance of the network against the state-of-the-art models.

翻译：典型的方法侧重于通过一个以编解码器-解码器为基础的天真神经网络结构预测清洁的语音频谱,这些方法a)不足以充分使用数据,b)无法有效地平衡视听特征;拟议模型减轻了这些缺点,其方法是:(a) 应用一种模型,在编码阶段将视听特征层按层层结合,并将合成的视听特征带入每个相应的解码层,更重要的是,(b) 引入一个两阶段多层注意机制,用以推断视听语音增强,以平衡装配的视听特征和消除无关特征;本文提出了注意的视听多层特征聚合模型,其中MHCA单元用于在调解码器的每一层进行地貌制图;拟议模型显示网络相对于最先进的模型的优性能。

0

相关内容

Attention

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

三个不同发育关键期PCB153暴露致小鼠发育免疫毒性及其分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

II-VI族半导体芯/壳纳米线异质结中的应变和掺杂

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

人巨细胞病毒蛋白pUL117调控宿主细胞周期的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Wnt信号调节蛋白Gpr177在小鼠上腭发育中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

4f和3d电子调控下的新型In和Te基稀土1：3型半导体化合物的磁输运和结构

国家自然科学基金

0+阅读 · 2012年12月31日

藏药“甲嘎松汤”调节Nrf2/Bach1-ARE-抗氧化酶通路以抗氧化保肝的研究

国家自然科学基金

0+阅读 · 2012年12月31日

表观遗传调控在发育早期铅暴露致LOAD进程中的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

Narf影响细胞衰老的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Low-light Enhancement Method Based on Attention Map Net

Arxiv

0+阅读 · 2022年8月19日

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Arxiv

0+阅读 · 2022年8月19日

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

Arxiv

0+阅读 · 2022年8月18日

Enhancing Targeted Attack Transferability via Diversified Weight Pruning

Arxiv

0+阅读 · 2022年8月18日

Unifying Visual Perception by Dispersible Points Learning

Arxiv

0+阅读 · 2022年8月18日

Visual Explanation of Deep Q-Network for Robot Navigation by Fine-tuning Attention Branch

Arxiv

0+阅读 · 2022年8月18日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Arxiv

78+阅读 · 2019年11月10日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Low-light Enhancement Method Based on Attention Map Net

Arxiv

0+阅读 · 2022年8月19日

Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods

Arxiv

0+阅读 · 2022年8月19日

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

Arxiv

0+阅读 · 2022年8月18日

Enhancing Targeted Attack Transferability via Diversified Weight Pruning

Arxiv

0+阅读 · 2022年8月18日

Unifying Visual Perception by Dispersible Points Learning

Arxiv

0+阅读 · 2022年8月18日

Visual Explanation of Deep Q-Network for Robot Navigation by Fine-tuning Attention Branch

Arxiv

0+阅读 · 2022年8月18日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Arxiv

78+阅读 · 2019年11月10日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

三个不同发育关键期PCB153暴露致小鼠发育免疫毒性及其分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

II-VI族半导体芯/壳纳米线异质结中的应变和掺杂

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

人巨细胞病毒蛋白pUL117调控宿主细胞周期的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Wnt信号调节蛋白Gpr177在小鼠上腭发育中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

4f和3d电子调控下的新型In和Te基稀土1：3型半导体化合物的磁输运和结构

国家自然科学基金

0+阅读 · 2012年12月31日

藏药“甲嘎松汤”调节Nrf2/Bach1-ARE-抗氧化酶通路以抗氧化保肝的研究

国家自然科学基金

0+阅读 · 2012年12月31日

表观遗传调控在发育早期铅暴露致LOAD进程中的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

Narf影响细胞衰老的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员