以变式 U-Net 结构实现端对端语音增强 (Towards end-to-end speech enhancement with a variational U-Net architecture) - 专知论文

会员服务 ·

0

语音增强 · 估计/估计量 · 端到端 · Networking · 掩码 ·

2020 年 12 月 7 日

Towards end-to-end speech enhancement with a variational U-Net architecture

翻译：以变式 U-Net 结构实现端对端语音增强

Eike J. Nustede,Jörn Anemüller

from arxiv, Submitted to IEEE ICASSP 2021

In this paper, we investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to skip preprocessing steps to directly work on the waveform signal, potentially neglecting relationships across higher dimensional spectro-temporal features. We study the adoption of a probabilistic bottleneck, as well as dilated convolutions, into the classic U-Net architecture. Evaluation of a number of network variants is carried out using signal-to-distortion ratio and perceptual model scores, with audio data including known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are required for successful end-to-end signal enhancement, i.e., without filter mask estimation. Further, they indicate a slight advantage of the variational U-Net architecture over its non-variational version in terms of signal enhancement performance under reverberant conditions. Specifically, PESQ scores show increases of 0.28 and 0.49 in reverberant and non-reverberant scenes, respectively. Anecdotal evidence points to improved suppression of impulsive noise sources with the variational end-to-end U-Net compared to the recurrent mask estimation network baseline.

翻译：深网络语音增强系统通常旨在估计过滤面罩,或选择跳过直接操作波形信号的预处理步骤,从而可能忽视高维光谱-时空特征之间的关系。我们研究在传统的 U-Net 结构中采用概率性瓶颈,以及放大变异,利用信号对扭曲率和感知模型分数对一些网络变异器进行评估,包括已知和未知的噪音类型以及回响。我们的实验表明,为了成功地增强端对端信号,需要使用拟议系统中的剩余(跳动)连接,也就是说,无需过滤面罩估计。此外,这些实验还表明,在反动性条件下,变异U-Net结构在信号增强性能方面略优于非变异性版本。具体地说,PESQ的分数显示,在回动和未知的噪音类型以及静态网络的变异性源之间,静态和静态网络的变异性基数分别增加了0.28和0.49,与静态网络的变异性基点相比,AVER-imal-deal-deal-deal-deal-visurvial-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-visual-se-se-visubal-vis比。

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020】机器学习无参数在线优化，294页ppt

【ICML2020】机器学习无参数在线优化，294页ppt

专知会员服务

55+阅读 · 2020年8月1日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

revelation of MONet

revelation of MONet

CreateAMind

5+阅读 · 2019年6月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Speaker and Direction Inferred Dual-channel Speech Separation

Arxiv

0+阅读 · 2021年2月8日

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Arxiv

1+阅读 · 2021年2月8日

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

Arxiv

0+阅读 · 2021年2月8日

Self-Attention Generative Adversarial Network for Speech Enhancement

Arxiv

0+阅读 · 2021年2月6日

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Arxiv

10+阅读 · 2020年3月13日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020】机器学习无参数在线优化，294页ppt

【ICML2020】机器学习无参数在线优化，294页ppt

专知会员服务

55+阅读 · 2020年8月1日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

revelation of MONet

revelation of MONet

CreateAMind

5+阅读 · 2019年6月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Speaker and Direction Inferred Dual-channel Speech Separation

Arxiv

0+阅读 · 2021年2月8日

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Arxiv

1+阅读 · 2021年2月8日

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

Arxiv

0+阅读 · 2021年2月8日

Self-Attention Generative Adversarial Network for Speech Enhancement

Arxiv

0+阅读 · 2021年2月6日

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Arxiv

10+阅读 · 2020年3月13日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

微信扫码咨询专知VIP会员