多麦克综合复合光谱绘图,以进行非偏异和持续语音分离 (Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation) - 专知论文

会员服务 ·

0

分离的 · Continuity · CSS · Performer · 极小点 ·

2021 年 5 月 24 日

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation

翻译：多麦克综合复合光谱绘图,以进行非偏异和持续语音分离

Zhong-Qiu Wang,Peidong Wang,DeLiang Wang

from arxiv, 14 pages, 6 figures. To appear in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Sound demo https://zqwang7.github.io/demos/SMSWSJ_demo/taslp20_SMSWSJ_demo.html

We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.

翻译：我们提出多麦克风复杂的光谱绘图,这是应用深度学习用于时间变化的非线性波束成像的简单方法,在回旋条件下让发言者分离,我们的目标是使发言者分离和偏角分解。我们的研究首先调查离线超音速扬声器分离,然后延伸至连线连续语音分离(CSS)。假设在培训和测试之间有一个固定的阵列几何,我们培训深神经网络,从多个麦克风的反射部件中,在参考麦克风中预测目标讲话的真实和想象(RI)组成部分。然后,我们将多麦克风复合光谱制图与最小差异不扭曲反应(MVDR)相融合,同时进行最小差异不扭曲反应(MVDR)成形和后过滤反应,以进一步改进分离,并将它与架在CSS的轮廓级发言者结合起来。虽然我们的系统在模拟室动脉冲反应(RIR)方面受过训练,但该系统根据特定几何测量所安排的固定数目,概括为真实阵列。在模拟的两部S-S-S-S-S-S-S-CS-CS-CS-CS-CS-CS-CS-CS-CS-S-M-M-M-S-S-S-S-S-S-S-S-CS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-CM-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-M-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S

0

相关内容

分离的

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

专知会员服务

25+阅读 · 2020年3月30日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

专知会员服务

51+阅读 · 2020年2月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

计算机 | ICDE 2020等国际会议信息8条

计算机 | ICDE 2020等国际会议信息8条

Call4Papers

3+阅读 · 2019年5月24日

人工智能 | SCI期刊专刊/国际会议信息7条

人工智能 | SCI期刊专刊/国际会议信息7条

Call4Papers

7+阅读 · 2019年3月12日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

已删除

将门创投

4+阅读 · 2017年12月12日

Linear and Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2021年7月13日

Direct speech-to-speech translation with discrete units

Direct speech-to-speech translation with discrete units

Arxiv

0+阅读 · 2021年7月12日

Hyperspectral and Multispectral Classification for Coastal Wetland Using Depthwise Feature Interaction Network

Arxiv

0+阅读 · 2021年7月12日

Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

Arxiv

0+阅读 · 2021年7月11日

A prediction perspective on the Wiener-Hopf equations for discrete time series

Arxiv

0+阅读 · 2021年7月11日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Speaker Recognition from raw waveform with SincNet

Arxiv

6+阅读 · 2018年7月29日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

专知会员服务

25+阅读 · 2020年3月30日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

专知会员服务

51+阅读 · 2020年2月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

计算机 | ICDE 2020等国际会议信息8条

计算机 | ICDE 2020等国际会议信息8条

Call4Papers

3+阅读 · 2019年5月24日

人工智能 | SCI期刊专刊/国际会议信息7条

人工智能 | SCI期刊专刊/国际会议信息7条

Call4Papers

7+阅读 · 2019年3月12日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

已删除

将门创投

4+阅读 · 2017年12月12日

相关论文

Linear and Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2021年7月13日

Direct speech-to-speech translation with discrete units

Direct speech-to-speech translation with discrete units

Arxiv

0+阅读 · 2021年7月12日

Hyperspectral and Multispectral Classification for Coastal Wetland Using Depthwise Feature Interaction Network

Arxiv

0+阅读 · 2021年7月12日

Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

Arxiv

0+阅读 · 2021年7月11日

A prediction perspective on the Wiener-Hopf equations for discrete time series

Arxiv

0+阅读 · 2021年7月11日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Speaker Recognition from raw waveform with SincNet

Arxiv

6+阅读 · 2018年7月29日

微信扫码咨询专知VIP会员