以2 - D 革命时间-频率域域域特征和预先培训的声波模型加强多频道语音 (Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model) - 专知论文

会员服务 ·

0

语音增强 · 卷积 · INFORMS · 层 · Integration ·

2021 年 9 月 24 日

Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

翻译：以2 - D 革命时间-频率域域域特征和预先培训的声波模型加强多频道语音

Quandong Wang,Junnan Wu,Zhao Yan,Sichong Qian,Liyong Guo,Lichun Fan,Weiji Zhuang,Peng Gao,Yujun Wang

from arxiv, 7 pages, 3 figures, accepted to APSIPA 2021, revised

We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are computed and then integrated with the first 2-D convolutional layer, while in the frequency domain, the log-power spectra (LPS) features from both original channels and super-directive beamforming outputs are combined with a second 2-D convolutional layer. To fully integrate the rich information of multi-channel speech, i.e. time-frequency domain features and the array geometry, we apply a third 2-D convolutional layer in the second fusion stage to obtain the final convolutional features. Furthermore, we propose to use a fixed clean acoustic model trained with the end-to-end lattice-free maximum mutual information criterion to enforce the enhanced output to have the same distribution as the clean waveform to alleviate the over-estimation problem of the enhancement task and constrain distortion. On the Task1 development dataset of ConferencingSpeech 2021 challenge, a PESQ improvement of 0.24 and 0.19 is attained compared to the official baseline and a recently proposed multi-channel separation method.

翻译：我们建议采用多声道语音增强方法,采用新型的两阶段特征融合方法和多任务学习范式中经过预先训练的声学模型。在合并的第一阶段,时间-空间和频率-域特性分开分离。在时间域中,多声道连动和气道连动差异(MCS)特性进行计算,然后与第一个2层电动差异(ICD)结合。在频率域中,原频道和超向波成型超级波段的日电源光谱(LPS)特性与第二个2-D进化层的高级声频模型相结合。为了充分整合多声道讲话的丰富信息,即时间-频率域特性和阵列地理测量,我们在第二个电流阶段应用第三个2-D电道电动层,以获得最后的电动特性。此外,我们提议使用一个固定的清洁音响模型,在零至端无顶层的最大相互信息标准下,用强化的2D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-A-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

中国AI语音识别市场研究报告（附PDF下载）

中国AI语音识别市场研究报告（附PDF下载）

专知会员服务

78+阅读 · 2020年12月30日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

专知会员服务

25+阅读 · 2020年3月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

专知会员服务

15+阅读 · 2019年10月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

人工智能领域顶会IJCAI 2018 接受论文列表

人工智能领域顶会IJCAI 2018 接受论文列表

专知

5+阅读 · 2018年5月16日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

已删除

将门创投

6+阅读 · 2017年11月27日

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Arxiv

0+阅读 · 2021年11月16日

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Arxiv

0+阅读 · 2021年11月15日

Time-Frequency Attention for Monaural Speech Enhancement

Arxiv

0+阅读 · 2021年11月15日

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Arxiv

0+阅读 · 2021年11月11日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Brain Tumor Segmentation Based on Refined Fully Convolutional Neural Networks with A Hierarchical Dice Loss

Arxiv

4+阅读 · 2017年12月25日

VIP会员

文章信息

相关主题

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

中国AI语音识别市场研究报告（附PDF下载）

中国AI语音识别市场研究报告（附PDF下载）

专知会员服务

78+阅读 · 2020年12月30日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

专知会员服务

25+阅读 · 2020年3月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

专知会员服务

15+阅读 · 2019年10月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

人工智能领域顶会IJCAI 2018 接受论文列表

人工智能领域顶会IJCAI 2018 接受论文列表

专知

5+阅读 · 2018年5月16日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

已删除

将门创投

6+阅读 · 2017年11月27日

相关论文

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Arxiv

0+阅读 · 2021年11月16日

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Arxiv

0+阅读 · 2021年11月15日

Time-Frequency Attention for Monaural Speech Enhancement

Arxiv

0+阅读 · 2021年11月15日

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Arxiv

0+阅读 · 2021年11月11日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Brain Tumor Segmentation Based on Refined Fully Convolutional Neural Networks with A Hierarchical Dice Loss

Arxiv

4+阅读 · 2017年12月25日

微信扫码咨询专知VIP会员