频道推介:通过使用输入频道随机化的培训,改进多频道ASR的普及化 (ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization) - 专知论文

会员服务 ·

0

语音识别 · 泛化理论 · 模型评估 · 通道 · E2E ·

2021 年 9 月 23 日

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

翻译：频道推介:通过使用输入频道随机化的培训,改进多频道ASR的普及化

Marco Gaudesi,Felix Weninger,Dushyant Sharma,Puming Zhan

from arxiv, To appear in ASRU 2021

End-to-end (E2E) multi-channel ASR systems show state-of-the-art performance in far-field ASR tasks by joint training of a multi-channel front-end along with the ASR model. The main limitation of such systems is that they are usually trained with data from a fixed array geometry, which can lead to degradation in accuracy when a different array is used in testing. This makes it challenging to deploy these systems in practice, as it is costly to retrain and deploy different models for various array configurations. To address this, we present a simple and effective data augmentation technique, which is based on randomly dropping channels in the multi-channel audio input during training, in order to improve the robustness to various array configurations at test time. We call this technique ChannelAugment, in contrast to SpecAugment (SA) which drops time and/or frequency components of a single channel input audio. We apply ChannelAugment to the Spatial Filtering (SF) and Minimum Variance Distortionless Response (MVDR) neural beamforming approaches. For SF, we observe 10.6% WER improvement across various array configurations employing different numbers of microphones. For MVDR, we achieve a 74% reduction in training time without causing degradation of recognition accuracy.

翻译：终端到终端(E2E)多通道 ASR 系统显示远场 ASR 任务的最新性能,与 ASR 模型一起对多通道前端和 ASR 模型进行联合培训,这些系统的主要局限性是,它们通常接受固定阵列几何数据的培训,如果测试时使用不同的阵列,这可能导致精确度下降。因此,实际部署这些系统具有挑战性,因为对不同阵列配置进行再培训和部署不同模型的费用昂贵。为此,我们展示了一种简单有效的数据增强技术,其基础是在培训期间多通道音频输入中随机下降的频道,以提高测试时各种阵列配置的稳性。我们称之为“技术通道建议”,与SpetraAugment(SA)相比,它会降低单个频道输入音频的时段和/或频率组成部分。我们对空间过滤(SF)和最低不易分解反应(MLDDR) 神经成型反应应用了一种简单有效的数据增强技术。对于多通道音道音频输入器输入方法而言,我们观察了74.6%的MDRDR RDR dam regradustration 方法,我们在不使用不同阵列中采用10.

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

【SIGIR2021】图神经网络序列推荐

专知会员服务

57+阅读 · 2021年6月30日

《图Transformer网络与语音识别》Facebook语音大牛Awni Hannun，附121页Slides与视频

专知会员服务

33+阅读 · 2021年6月26日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【康奈尔大学-Facebook】特征归一化与数据增强，Feature Normalization

【康奈尔大学-Facebook】特征归一化与数据增强，Feature Normalization

专知会员服务

57+阅读 · 2020年3月9日

【AISTATS2020接受论文】时空对齐，过空间和时间的最优transport（Spatio-Temporal Alignments: Optimal transport through space and time）

【AISTATS2020接受论文】时空对齐，过空间和时间的最优transport（Spatio-Temporal Alignments: Optimal transport through space and time）

专知会员服务

30+阅读 · 2020年1月11日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

专知会员服务

5+阅读 · 2019年12月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

GitHub：数据增广最全资料集锦

GitHub：数据增广最全资料集锦

CVer

7+阅读 · 2020年10月10日

ICML2019：Google和Facebook在推进哪些方向？

ICML2019：Google和Facebook在推进哪些方向？

专知

5+阅读 · 2019年6月13日

用于语音识别的数据增强

用于语音识别的数据增强

AI研习社

24+阅读 · 2019年6月5日

谷歌提出新型自动语音识别数据增强大法，直接对频谱图“动刀”，提升模型表现

谷歌提出新型自动语音识别数据增强大法，直接对频谱图“动刀”，提升模型表现

量子位

8+阅读 · 2019年4月28日

已删除

将门创投

6+阅读 · 2019年1月2日

GitHub项目推荐 | 场景文字图像增广工具 Scene Text Image Transformer

GitHub项目推荐 | 场景文字图像增广工具 Scene Text Image Transformer

AI研习社

5+阅读 · 2018年12月11日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

Large-Scale Hyperspectral Image Clustering Using Contrastive Learning

Arxiv

0+阅读 · 2021年11月15日

Contrastive Representation Learning with Trainable Augmentation Channel

Arxiv

1+阅读 · 2021年11月15日

Federated Learning for Channel Estimation in Conventional and RIS-Assisted Massive MIMO

Arxiv

0+阅读 · 2021年11月15日

Single-Index Importance Sampling with Stratification

Arxiv

0+阅读 · 2021年11月15日

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Arxiv

1+阅读 · 2021年11月12日

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Arxiv

0+阅读 · 2021年11月11日

Self-Improved Retrosynthetic Planning

Arxiv

3+阅读 · 2021年6月9日

Data augmentation using learned transforms for one-shot medical image segmentation

Arxiv

5+阅读 · 2019年2月25日

Sample Efficient Adaptive Text-to-Speech

Arxiv

5+阅读 · 2019年1月16日

Building medical image classifiers with very limited data using segmentation networks

Arxiv

4+阅读 · 2018年8月15日

VIP会员

文章信息

相关主题

相关VIP内容

【SIGIR2021】图神经网络序列推荐

专知会员服务

57+阅读 · 2021年6月30日

《图Transformer网络与语音识别》Facebook语音大牛Awni Hannun，附121页Slides与视频

专知会员服务

33+阅读 · 2021年6月26日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【康奈尔大学-Facebook】特征归一化与数据增强，Feature Normalization

【康奈尔大学-Facebook】特征归一化与数据增强，Feature Normalization

专知会员服务

57+阅读 · 2020年3月9日

【AISTATS2020接受论文】时空对齐，过空间和时间的最优transport（Spatio-Temporal Alignments: Optimal transport through space and time）

【AISTATS2020接受论文】时空对齐，过空间和时间的最优transport（Spatio-Temporal Alignments: Optimal transport through space and time）

专知会员服务

30+阅读 · 2020年1月11日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

【NAACL 2019 workshop】相似语言、变体和方言自然语言处理 The workshop on NLP for Similar Languages, Varieties and Dialects，约翰斯·霍普金斯大学|David Yarowsky

专知会员服务

5+阅读 · 2019年12月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

GitHub：数据增广最全资料集锦

GitHub：数据增广最全资料集锦

CVer

7+阅读 · 2020年10月10日

ICML2019：Google和Facebook在推进哪些方向？

ICML2019：Google和Facebook在推进哪些方向？

专知

5+阅读 · 2019年6月13日

用于语音识别的数据增强

用于语音识别的数据增强

AI研习社

24+阅读 · 2019年6月5日

谷歌提出新型自动语音识别数据增强大法，直接对频谱图“动刀”，提升模型表现

谷歌提出新型自动语音识别数据增强大法，直接对频谱图“动刀”，提升模型表现

量子位

8+阅读 · 2019年4月28日

已删除

将门创投

6+阅读 · 2019年1月2日

GitHub项目推荐 | 场景文字图像增广工具 Scene Text Image Transformer

GitHub项目推荐 | 场景文字图像增广工具 Scene Text Image Transformer

AI研习社

5+阅读 · 2018年12月11日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

相关论文

Large-Scale Hyperspectral Image Clustering Using Contrastive Learning

Arxiv

0+阅读 · 2021年11月15日

Contrastive Representation Learning with Trainable Augmentation Channel

Arxiv

1+阅读 · 2021年11月15日

Federated Learning for Channel Estimation in Conventional and RIS-Assisted Massive MIMO

Arxiv

0+阅读 · 2021年11月15日

Single-Index Importance Sampling with Stratification

Arxiv

0+阅读 · 2021年11月15日

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Arxiv

1+阅读 · 2021年11月12日

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Arxiv

0+阅读 · 2021年11月11日

Self-Improved Retrosynthetic Planning

Arxiv

3+阅读 · 2021年6月9日

Data augmentation using learned transforms for one-shot medical image segmentation

Arxiv

5+阅读 · 2019年2月25日

Sample Efficient Adaptive Text-to-Speech

Arxiv

5+阅读 · 2019年1月16日

Building medical image classifiers with very limited data using segmentation networks

Arxiv

4+阅读 · 2018年8月15日

微信扫码咨询专知VIP会员