DeepFilterNet:基于深过滤的全带音频低复杂度语音增强框架 (DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering) - 专知论文

会员服务 ·

0

语音增强 · Extensibility · Processing（编程语言） · 掩码 · INFORMS ·

2021 年 10 月 11 日

DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering

翻译：DeepFilterNet:基于深过滤的全带音频低复杂度语音增强框架

Hendrik Schröter,Alberto N. Escalante-B.,Tobias Rosenkranz,Andreas Maier

Complex-valued processing has brought deep learning-based speech enhancement and signal extraction to a new level. Typically, the process is based on a time-frequency (TF) mask which is applied to a noisy spectrogram, while complex masks (CM) are usually preferred over real-valued masks due to their ability to modify the phase. Recent work proposed to use a complex filter instead of a point-wise multiplication with a mask. This allows to incorporate information from previous and future time steps exploiting local correlations within each frequency band. In this work, we propose DeepFilterNet, a two stage speech enhancement framework utilizing deep filtering. First, we enhance the spectral envelope using ERB-scaled gains modeling the human frequency perception. The second stage employs deep filtering to enhance the periodic components of speech. Additionally to taking advantage of perceptual properties of speech, we enforce network sparsity via separable convolutions and extensive grouping in linear and recurrent layers to design a low complexity architecture. We further show that our two stage deep filtering approach outperforms complex masks over a variety of frequency resolutions and latencies and demonstrate convincing performance compared to other state-of-the-art models.

翻译：复杂估价的处理使基于深层次学习的语音增强和信号提取达到一个新的水平。通常,这一过程基于一个时间频率(TF)遮罩,用于噪音光谱图,而复杂的遮罩由于能够改变阶段而通常优于具有实际价值的遮罩。最近的工作提议使用复杂的过滤器,而不是使用带有掩罩的点度乘法。这样就可以将利用每个频段内当地关联的以往和今后时间步骤的信息纳入其中。在这项工作中,我们提议使用深过滤器,这是一个两个阶段的加强语音框架。首先,我们利用ERB尺度的增益模型模拟人类频率感知,加强光谱封封。第二阶段采用深度过滤法来增强定期语音组成部分。除了利用语音的感知性特性外,我们还通过分解的演算和在线性与经常性层的广泛组合,加强网络的紧张性,以设计一个低复杂度的结构。我们进一步表明,我们两个阶段的深过滤方法超越了各种频率分辨率和迟误差的复杂面罩,并展示了与其他状态模型相比令人信服的性。

1

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

专知会员服务

63+阅读 · 2021年1月16日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

专知会员服务

117+阅读 · 2020年4月3日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

11+阅读 · 2020年7月15日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2018年4月10日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

业界 | 腾讯论文入选Interspeech 2017：在单通道语音分离中应用的深度神经网路的训练优化

业界 | 腾讯论文入选Interspeech 2017：在单通道语音分离中应用的深度神经网路的训练优化

AI科技评论

7+阅读 · 2017年8月24日

A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

Arxiv

0+阅读 · 2021年12月9日

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

Arxiv

0+阅读 · 2021年12月7日

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Arxiv

0+阅读 · 2021年12月4日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Neural Graph Collaborative Filtering

Arxiv

8+阅读 · 2019年5月20日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Spectral Network Embedding: A Fast and Scalable Method via Sparsity

Arxiv

3+阅读 · 2018年6月7日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

专知会员服务

63+阅读 · 2021年1月16日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

对话推荐系统综述论文，35页pdf，A Survey on Conversational Recommender Systems

专知会员服务

117+阅读 · 2020年4月3日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

11+阅读 · 2020年7月15日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2018年4月10日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

业界 | 腾讯论文入选Interspeech 2017：在单通道语音分离中应用的深度神经网路的训练优化

业界 | 腾讯论文入选Interspeech 2017：在单通道语音分离中应用的深度神经网路的训练优化

AI科技评论

7+阅读 · 2017年8月24日

相关论文

A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

Arxiv

0+阅读 · 2021年12月9日

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

Arxiv

0+阅读 · 2021年12月7日

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Arxiv

0+阅读 · 2021年12月4日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Neural Graph Collaborative Filtering

Arxiv

8+阅读 · 2019年5月20日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Spectral Network Embedding: A Fast and Scalable Method via Sparsity

Arxiv

3+阅读 · 2018年6月7日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

微信扫码咨询专知VIP会员