最终到最后噪音-燃烧式语音识别双纸风格学习 (Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition) - 专知论文

会员服务 ·

0

语音识别 · Learning · INFORMS · 端到端 · 自动语音识别 ·

2022 年 10 月 24 日

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

翻译：最终到最后噪音-燃烧式语音识别双纸风格学习

Yuchen Hu,Nana Hou,Chen Chen,Eng Siong Chng

from arxiv, 5 pages, 3 figures

Automatic speech recognition (ASR) systems degrade significantly in face of noisy conditions. Recently, speech enhancement (SE) has been introduced as front-end module to reduce noise and improve speech quality for ASR, but it would also suppress some important speech information, i.e., over-suppression problem. To alleviate this, we propose a dual-path style learning approach for end-to-end noise-robust automatic speech recognition (DPSL-ASR). Specifically, we first introduce clean speech feature along with the fused feature from previously proposed IFF-Net as dual-path inputs to recover the over-suppressed information. Then, we propose a style learning method to map the fused feature close to clean feature, in order to learn latent speech information from the latter, i.e., clean "speech style". Furthermore, we employ consistency loss to minimize the distance of ASR outputs in two paths to improve noise-robustness. Experimental results show that the proposed approach achieves relative word error rate (WER) reductions of 10.6% and 8.6% over the best IFF-Net baseline, on RATS Channel-A and CHiME-4 1-Channel Track datasets, respectively. Visualizations of intermediate embeddings indicate that DPSL-ASR can recover abundant over-suppressed information in enhanced speech. Our code is available at GitHub: https://github.com/YUCHEN005/DPSL-ASR.

翻译：面对吵闹的情况,自动语音识别系统(ASR)在出现噪音的情况下显著退化。最近,语音增强系统(SE)被引入了前端模块,以减少噪音,提高ASR的语音质量,但也会压制一些重要的语音信息,即过度压缩问题。为了缓解这一问题,我们提议了一种双向式学习方法,用于终端到终端噪音-紫色自动语音识别(DPSL-ASR),具体地说,我们首先引入了清洁的语音特征,同时引入了先前提议的IFF-Net的引信功能,作为恢复压抑性信息的双向式输入。然后,我们提出了一种风格学习方法,用于绘制离清洁功能近处的连接功能,以便从后者学习潜在语音信息,即“声音”型。此外,我们采用一致性损失,将ASR输出在两条道路上的距离降至最小,以改善噪音。实验结果表明,拟议的方法比IFF-网络最佳语音定位基线减少10.6%和8.6%,在RA-A和CHME-SIMAS 高级数据库中,可以恢复我们O-IS-SLA-SAL-D-D-D-SLA-SLA和CHNBAR-D-SLA-D-D-D-SLA-SAR-D-D-SARDRBSDR-DR-SDR-SDRBR-SD-SM-SB/C/CSDRBRBSM-SDR-SBAR 的升级数据。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于发音特征的汉语语音识别分层解码方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

PTHrP NLS与C末端促进小鼠脊髓损伤后髓鞘再生的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

时滞系统的可达集分析与控制

国家自然科学基金

0+阅读 · 2013年12月31日

复合级联多电平电池储能功率转换系统研究

国家自然科学基金

0+阅读 · 2013年12月31日

事件驱动下网络化非线性系统分析与控制

国家自然科学基金

0+阅读 · 2013年12月31日

具有随机参数的分数阶非线性系统的动力学及控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

PGE2/EP2介导间充质干细胞向急性肺损伤肺组织归巢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Chemerin通过P38MAPK途径介导糖尿病肾病及硫辛酸干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Arxiv

0+阅读 · 2022年12月7日

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

Arxiv

0+阅读 · 2022年12月6日

MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning

Arxiv

0+阅读 · 2022年12月5日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

VIP会员

文章信息

相关主题

自动语音识别

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

相关论文

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Arxiv

0+阅读 · 2022年12月7日

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

Arxiv

0+阅读 · 2022年12月6日

MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning

Arxiv

0+阅读 · 2022年12月5日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于发音特征的汉语语音识别分层解码方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

PTHrP NLS与C末端促进小鼠脊髓损伤后髓鞘再生的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

时滞系统的可达集分析与控制

国家自然科学基金

0+阅读 · 2013年12月31日

复合级联多电平电池储能功率转换系统研究

国家自然科学基金

0+阅读 · 2013年12月31日

事件驱动下网络化非线性系统分析与控制

国家自然科学基金

0+阅读 · 2013年12月31日

具有随机参数的分数阶非线性系统的动力学及控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

PGE2/EP2介导间充质干细胞向急性肺损伤肺组织归巢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Chemerin通过P38MAPK途径介导糖尿病肾病及硫辛酸干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员