联合培训增强语言能力和自我监督的噪音 -- -- 强噪音模式 (Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR) - 专知论文

会员服务 ·

0

语音增强 · 语音识别 · MoDELS · 稳健性 · 可约的 ·

2022 年 5 月 26 日

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

翻译：联合培训增强语言能力和自我监督的噪音 -- -- 强噪音模式

Qiu-Shi Zhu,Jie Zhang,Zi-Qiang Zhang,Li-Rong Dai

from arxiv, submitted to IEEE/ACM TASLP. arXiv admin note: text overlap with arXiv:2201.08930

Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion. On the other hand, it was shown that self-supervised pre-training enables the utilization of a large amount of unlabeled noisy data, which is rather beneficial for the noise robustness of ASR. However, the potential of the (optimal) integration of SE and self-supervised pre-training still remains unclear. In order to find an appropriate combination and reduce the impact of speech distortion caused by SE, in this paper we therefore propose a joint pre-training approach for the SE module and the self-supervised model. First, in the pre-training phase the original noisy waveform or the waveform obtained by SE is fed into the self-supervised model to learn the contextual representation, where the quantified clean speech acts as the target. Second, we propose a dual-attention fusion method to fuse the features of noisy and enhanced speeches, which can compensate the information loss caused by separately using individual modules. Due to the flexible exploitation of clean/noisy/enhanced branches, the proposed method turns out to be a generalization of some existing noise-robust ASR models, e.g., enhanced wav2vec2.0. Finally, experimental results on both synthetic and real noisy datasets show that the proposed joint training approach can improve the ASR performance under various noisy settings, leading to a stronger noise robustness.

翻译：(SE)通常需要强化言语,作为改善噪音环境中言语质量的前端,因为由于言语扭曲,强化的言语可能不是自动语音识别系统的最佳方法。另一方面,据显示,自监管的训练前前训练能够利用大量未贴标签的噪音数据,这对ASR的噪音强力相当有益。然而,SE的(最佳)整合和自我监管的训练前训练的潜力仍然不明确。为了找到适当的组合,并减少SE造成的言语扭曲的影响,我们因此在本文中提议对SE模块和自监管模式采用联合培训前的处理方法。首先,在培训前阶段,最初的噪音波形或SE获得的波形将输入自监管模型,学习背景代表,而量化的清洁言语作为目标。其次,我们提议一种双调混合方法,将噪音和强化的言语特征结合起来,这可以弥补因单独使用单个模块而造成的信息损失。由于灵活地利用强化的精度联合培训模式,因此,ARC将现有的高压/高压模式转化为强化的实验方法。

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

多场耦合作用下垃圾填埋场多组分污染物击穿衬垫系统的机理与污染控制方法

国家自然科学基金

0+阅读 · 2014年12月31日

发形霞水母（Cyanea capillata）触手转录组分析及其重要活性因子的克隆表达与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土功能配合物抑制黑稻病原菌量热学与植物生长调节功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

以EGFR为识别靶位多靶点联合克服NSCLC EGFR TKIs耐药的基因干预研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-96参与老年性聋小鼠DBA/2J毛细胞凋亡的机制

国家自然科学基金

0+阅读 · 2011年12月31日

基因修饰的内皮祖细胞靶向治疗HER-2阳性肿瘤的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

Arxiv

0+阅读 · 2022年7月14日

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Arxiv

0+阅读 · 2022年7月14日

Enhanced Security and Privacy via Fragmented Federated Learning

Arxiv

0+阅读 · 2022年7月13日

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Arxiv

0+阅读 · 2022年7月13日

Appearance-guided Attentive Self-Paced Learning for Unsupervised Salient Object Detection

Arxiv

0+阅读 · 2022年7月13日

Attacking (and defending) the Maritime Radar System

Attacking (and defending) the Maritime Radar System

Arxiv

0+阅读 · 2022年7月12日

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

Arxiv

0+阅读 · 2022年7月12日

Markov Decision Process For Automatic Cyber Defense

Arxiv

0+阅读 · 2022年7月12日

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

Arxiv

0+阅读 · 2022年7月12日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

Arxiv

0+阅读 · 2022年7月14日

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Arxiv

0+阅读 · 2022年7月14日

Enhanced Security and Privacy via Fragmented Federated Learning

Arxiv

0+阅读 · 2022年7月13日

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Arxiv

0+阅读 · 2022年7月13日

Appearance-guided Attentive Self-Paced Learning for Unsupervised Salient Object Detection

Arxiv

0+阅读 · 2022年7月13日

Attacking (and defending) the Maritime Radar System

Attacking (and defending) the Maritime Radar System

Arxiv

0+阅读 · 2022年7月12日

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

Arxiv

0+阅读 · 2022年7月12日

Markov Decision Process For Automatic Cyber Defense

Arxiv

0+阅读 · 2022年7月12日

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

Arxiv

0+阅读 · 2022年7月12日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

相关基金

多场耦合作用下垃圾填埋场多组分污染物击穿衬垫系统的机理与污染控制方法

国家自然科学基金

0+阅读 · 2014年12月31日

发形霞水母（Cyanea capillata）触手转录组分析及其重要活性因子的克隆表达与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土功能配合物抑制黑稻病原菌量热学与植物生长调节功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

以EGFR为识别靶位多靶点联合克服NSCLC EGFR TKIs耐药的基因干预研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-96参与老年性聋小鼠DBA/2J毛细胞凋亡的机制

国家自然科学基金

0+阅读 · 2011年12月31日

基因修饰的内皮祖细胞靶向治疗HER-2阳性肿瘤的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员