清洁者: ASR 的麦克风阵列配置变量、流式、多通道增强神经前端 (Cleanformer: A microphone array configuration-invariant, streaming, multichannel neural enhancement frontend for ASR) - 专知论文

会员服务 ·

0

语音识别 · 噪声 · MoDELS · 掩码 · 讲稿 ·

2022 年 4 月 28 日

Cleanformer: A microphone array configuration-invariant, streaming, multichannel neural enhancement frontend for ASR

翻译：清洁者: ASR 的麦克风阵列配置变量、流式、多通道增强神经前端

Joseph Caroselli,Arun Naranayan,Tom O'Malley

from arxiv, Submitted to Interspeech 2022

This work introduces the Cleanformer, a streaming multichannel neural based enhancement frontend for automatic speech recognition (ASR). This model has a conformer-based architecture which takes as inputs a single channel each of raw and enhanced signals, and uses self-attention to derive a time-frequency mask. The enhanced input is generated by a multichannel adaptive noise cancellation algorithm known as Speech Cleaner, which makes use of noise context to derive its filter taps. The time-frequency mask is applied to the noisy input to produce enhanced output features for ASR. Detailed evaluations are presented with simulated and re-recorded datasets in speech-based and non-speech-based noise that show significant reduction in word error rate (WER) when using a large-scale state-of-the-art ASR model. It also will be shown to significantly outperform enhancement using a beamformer with ideal steering. The enhancement model is agnostic of the number of microphones and array configuration and, therefore, can be used with different microphone arrays without the need for retraining. It is demonstrated that performance improves with more microphones, up to 4, with each additional microphone providing a smaller marginal benefit. Specifically, for an SNR of -6dB, relative WER improvements of about 80\% are shown in both noise conditions.

翻译：这项工作引入了 Cleanext, 这是一个流式多通道神经增强前端, 用于自动语音识别( ASR) 。这个模型有一个基于校正的架构, 将每个原始信号和增强信号的单一频道作为输入器, 并使用自省来生成时频遮罩。增强的输入是由名为“ 语音清洁” 的多频道适应性噪音取消算法生成的, 该算法使用噪音背景来提取过滤器。时间- 频率遮罩应用于噪音输入中, 为 ASR 生成增强的输出功能。详细评价用基于语音和非语音的噪音中模拟和重新录制数据集来显示, 在使用大型状态的 ASR 模型时, 将明显降低字差率( WER ) 。还将显示, 使用一个使用理想方向的信号显示, 使用噪声背景来生成声音和阵列配置, 因而可以在无需再培训的情况下使用不同的麦克风阵列进行详细评价。显示, 使用更大型的麦克风和无声波波波的音频率将大幅降低到 4 。, 将显示为S- 80- b 的每平级的频率将显示一个小的频率的比小的 RRC 。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

VIP中间神经元失抑制效应在MCD痫性放电自限性受损中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

野生毛花猕猴桃酚类物质积累差异及合成相关基因表达与功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

南海不同种类现代砗磲Sr/Ca特征及其发展为SST替代性指标的潜力

国家自然科学基金

0+阅读 · 2013年12月31日

NiMnInCo磁热合金的绝热温变研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

斜纹夜蛾的分龄治理的数学描述和模型研究

国家自然科学基金

0+阅读 · 2011年12月31日

分形多孔介质中非线性扩散模型的定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

Arxiv

0+阅读 · 2022年6月16日

Asymptotic Soft Cluster Pruning for Deep Neural Networks

Asymptotic Soft Cluster Pruning for Deep Neural Networks

Arxiv

0+阅读 · 2022年6月16日

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Arxiv

0+阅读 · 2022年6月16日

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Arxiv

0+阅读 · 2022年6月15日

Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement

Arxiv

0+阅读 · 2022年6月15日

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Arxiv

0+阅读 · 2022年6月15日

Fast Model Editing at Scale

Arxiv

0+阅读 · 2022年6月13日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

相关论文

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

Arxiv

0+阅读 · 2022年6月16日

Asymptotic Soft Cluster Pruning for Deep Neural Networks

Asymptotic Soft Cluster Pruning for Deep Neural Networks

Arxiv

0+阅读 · 2022年6月16日

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Arxiv

0+阅读 · 2022年6月16日

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Arxiv

0+阅读 · 2022年6月15日

Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement

Arxiv

0+阅读 · 2022年6月15日

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Arxiv

0+阅读 · 2022年6月15日

Fast Model Editing at Scale

Arxiv

0+阅读 · 2022年6月13日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

相关基金

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

VIP中间神经元失抑制效应在MCD痫性放电自限性受损中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

野生毛花猕猴桃酚类物质积累差异及合成相关基因表达与功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

南海不同种类现代砗磲Sr/Ca特征及其发展为SST替代性指标的潜力

国家自然科学基金

0+阅读 · 2013年12月31日

NiMnInCo磁热合金的绝热温变研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

斜纹夜蛾的分龄治理的数学描述和模型研究

国家自然科学基金

0+阅读 · 2011年12月31日

分形多孔介质中非线性扩散模型的定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员