目标讲话隔离多渠道多语道 ADL-MVDR (Multi-channel Multi-frame ADL-MVDR for Target Speech Separation) - 专知论文

会员服务 ·

0

分离的 · 语音识别 · Neural Networks · 自动语音识别 · 极小点 ·

2021 年 11 月 15 日

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

翻译：目标讲话隔离多渠道多语道 ADL-MVDR

Zhuohuang Zhang,Yong Xu,Meng Yu,Shi-Xiong Zhang,Lianwu Chen,Donald S. Williamson,Dong Yu

from arxiv, Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP); Demos available at https://zzhang68.github.io/mcmf-adl-mvdr/

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed systems are evaluated using a Mandarin audio-visual corpus and are compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed systems under different scenarios and across several objective evaluation metrics, including ASR performance.

翻译：提出了许多纯粹基于神经网络的语音分离方法,以改善客观评估分数,但往往采用非线性扭曲方法,有害现代自动语音识别系统; 往往采用最低差异无偏差反应过滤器,消除非线性扭曲,然而,传统的以神经面具为基础的MVDR系统仍然造成相对较高的残余噪音; 此外,在与神经网络联合培训期间,MVDR解决方案所涉及的矩阵有时在数字上不稳定; 在本研究中,我们提议采用多渠道多框架(MMCMF),所有深度学习(ADL)-MVDR方法,用于目标语音分离,这扩展了我们最初的多频道ADL-MVDR方法; 拟议的MCMF ADL-MVDR系统处理线性和非线性扭曲问题; 拟议的方法还充分利用了Spatio-时空交叉关系; 拟议的系统使用曼达林音像资料库进行了评价,并与若干最先进的方法进行了比较; 实验结果显示我们提议的系统在不同的情景下和跨越若干客观评价指标,包括ASR。

0

相关内容

分离的

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

58+阅读 · 2020年8月28日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Stability and Super-resolution of MUSIC and ESPRIT for Multi-snapshot Spectral Estimation

Arxiv

0+阅读 · 2022年1月20日

Selecting and combining complementary feature representations and classifiers for hate speech detection

Arxiv

0+阅读 · 2022年1月18日

MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Arxiv

0+阅读 · 2022年1月17日

Clustering-based Joint Channel Estimation and Signal Detection for NOMA

Arxiv

0+阅读 · 2022年1月17日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Apple Flower Detection using Deep Convolutional Networks

Arxiv

3+阅读 · 2018年9月17日

A Convolutional Feature Map based Deep Network targeted towards Traffic Detection and Classification

Arxiv

3+阅读 · 2018年5月22日

Multi-Channel Pyramid Person Matching Network for Person Re-Identification

Arxiv

7+阅读 · 2018年3月7日

VIP会员

文章信息

相关主题

Neural Networks

自动语音识别

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

58+阅读 · 2020年8月28日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】稳定电影度量：面向专业视频生成的结构化分类与评测体系

战场AI决策支持系统

【博士论文】面向排序与扩散模型的安全、高效与鲁棒强化学习

面向 AI 生成图像的安全与鲁棒水印：全面综述

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Stability and Super-resolution of MUSIC and ESPRIT for Multi-snapshot Spectral Estimation

Arxiv

0+阅读 · 2022年1月20日

Selecting and combining complementary feature representations and classifiers for hate speech detection

Arxiv

0+阅读 · 2022年1月18日

MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Arxiv

0+阅读 · 2022年1月17日

Clustering-based Joint Channel Estimation and Signal Detection for NOMA

Arxiv

0+阅读 · 2022年1月17日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Apple Flower Detection using Deep Convolutional Networks

Arxiv

3+阅读 · 2018年9月17日

A Convolutional Feature Map based Deep Network targeted towards Traffic Detection and Classification

Arxiv

3+阅读 · 2018年5月22日

Multi-Channel Pyramid Person Matching Network for Person Re-Identification

Arxiv

7+阅读 · 2018年3月7日

微信扫码咨询专知VIP会员