MFFCN:促进视听语言强化的多层次地貌融合演变网络 (MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement) - 专知论文

会员服务 ·

0

语音增强 · 层 · Networking · INFORMS · 卷积 ·

2021 年 1 月 15 日

MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement

翻译：MFFCN:促进视听语言强化的多层次地貌融合演变网络

Xinmeng Xu,Dongxiang Xu,Jie Jia,Yang Wang,Binbin Chen

The purpose of speech enhancement is to extract target speech signal from a mixture of sounds generated from several sources. Speech enhancement can potentially benefit from the visual information from the target speaker, such as lip move-ment and facial expressions, because the visual aspect of speech isessentially unaffected by acoustic environment. In order to fuse audio and visual information, an audio-visual fusion strategy is proposed, which goes beyond simple feature concatenation and learns to automatically align the two modalities, leading to more powerful representation which increase intelligibility in noisy conditions. The proposed model fuses audio-visual featureslayer by layer, and feed these audio-visual features to each corresponding decoding layer. Experiment results show relative improvement from 6% to 24% on test sets over the audio modalityalone, depending on audio noise level. Moreover, there is a significant increase of PESQ from 1.21 to 2.06 in our -15 dB SNR experiment.

翻译：增强语音的目的是从若干来源产生的声音混合中提取目标语音信号。增强语音可能受益于来自目标发言者的视觉信息,如嘴动和面部表达,因为声音的视觉方面基本上不受声学环境的影响。为了整合音频和视觉信息,提出了视听融合战略,该战略超越简单的特征融合,学会自动调整两种模式,从而在吵闹的条件下提高能见度。拟议模型将视听特征层按层进行整合,并将这些视听特征提供给相应的解码层。实验结果显示,视音响程度而定,在音频模式单体上测试器上从6%到24%的相对改善。此外,在我们 - 15 DNR 实验中,PESQ从1.21%增加到2.06。

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

【AAAI2021】时空融合图神经网络的交通流预测

专知会员服务

110+阅读 · 2020年12月22日

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

专知会员服务

25+阅读 · 2020年7月28日

一份实用《图神经网络GNN》笔记，45页pdf

专知会员服务

119+阅读 · 2020年7月22日

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

专知会员服务

75+阅读 · 2020年6月14日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【干货】深度学习视觉跟踪:论文最新综述，23页pdf，Deep Learning for Visual Tracking: A Comprehensive Survey

【干货】深度学习视觉跟踪:论文最新综述，23页pdf，Deep Learning for Visual Tracking: A Comprehensive Survey

专知会员服务

58+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

专知

7+阅读 · 2018年2月26日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

Arxiv

0+阅读 · 2021年3月8日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

Arxiv

9+阅读 · 2019年10月12日

Omni-directional Feature Learning for Person Re-identification

Omni-directional Feature Learning for Person Re-identification

Arxiv

3+阅读 · 2018年12月13日

iQIYI-VID: A Large Dataset for Multi-modal Person Identification

Arxiv

4+阅读 · 2018年11月19日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Speaker Recognition from raw waveform with SincNet

Arxiv

6+阅读 · 2018年7月29日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2021】时空融合图神经网络的交通流预测

专知会员服务

110+阅读 · 2020年12月22日

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

最新《深度学习视频超分》综述论文，30页pdf，Video Super Resolution Based on Deep Learning: A comprehensive survey

专知会员服务

25+阅读 · 2020年7月28日

一份实用《图神经网络GNN》笔记，45页pdf

专知会员服务

119+阅读 · 2020年7月22日

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

专知会员服务

75+阅读 · 2020年6月14日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【干货】深度学习视觉跟踪:论文最新综述，23页pdf，Deep Learning for Visual Tracking: A Comprehensive Survey

【干货】深度学习视觉跟踪:论文最新综述，23页pdf，Deep Learning for Visual Tracking: A Comprehensive Survey

专知会员服务

58+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

专知

7+阅读 · 2018年2月26日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

Arxiv

0+阅读 · 2021年3月8日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

Arxiv

9+阅读 · 2019年10月12日

Omni-directional Feature Learning for Person Re-identification

Omni-directional Feature Learning for Person Re-identification

Arxiv

3+阅读 · 2018年12月13日

iQIYI-VID: A Large Dataset for Multi-modal Person Identification

Arxiv

4+阅读 · 2018年11月19日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Speaker Recognition from raw waveform with SincNet

Arxiv

6+阅读 · 2018年7月29日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

微信扫码咨询专知VIP会员