场外语音识别自动递减信体的产生偏差 (Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition) - 专知论文

会员服务 ·

0

Performer · 语音识别 · 估计/估计量 · MoDELS · 特征提取 ·

2021 年 8 月 12 日

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

翻译：场外语音识别自动递减信体的产生偏差

Anurenjan Purushothaman,Anirudh Sreeram,Rohit Kumar,Sriram Ganapathy

The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the long-term sub-band envelopes of speech. The sub-band envelopes are derived using frequency domain linear prediction (FDLP) which performs an autoregressive estimation of the Hilbert envelopes. The neural dereverberation model estimates the envelope gain which when applied to reverberant signals suppresses the late reflection components in the far-field signal. The dereverberated envelopes are used for feature extraction in speech recognition. Further, the sequence of steps involved in envelope dereverberation, feature extraction and acoustic modeling for ASR can be implemented as a single neural processing pipeline which allows the joint learning of the dereverberation network and the acoustic model. Several experiments are performed on the REVERB challenge dataset, CHiME-3 dataset and VOiCES dataset. In these experiments, the joint learning of envelope dereverberation and acoustic model yields significant performance improvements over the baseline ASR system based on log-mel spectrogram as well as other past approaches for dereverberation (average relative improvements of 10-24% over the baseline system). A detailed analysis on the choice of hyper-parameters and the cost function involved in envelope dereverberation is also provided.

翻译：远处环境中语音识别的任务受到作为子频带信封临时涂抹而出现的反动人工制品的不利影响。在本文中,我们利用长期次频段语音信封开发了语音偏移神经模型。亚频频带信封是利用频域域线性预测(FDLP)产生的,该预测对Hilbert信封进行自动递增估计。神经权位偏差模型估计了用于反动信号时会抑制远域信号中晚反射组件的封套增益。皮肤错位信封用于语音识别的特征提取。此外, 用于自动语音信封的脱动、特征提取和声学建模的步骤序列可以作为单一的神经域线性处理管道来实施,从而能够共同学习对Hilbert信封的自动反射网和声学模型。在REWERB挑战数据集、CHimeME-3数据套和VoiCES数据集中进行若干实验。在这些实验中,在语音信号中联合学习信封 derberation 和声波式信封式信封缩 24号用于语音感应变分析,作为过去基准线性模型分析的一部分。

0

相关内容

Performer

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Arxiv

0+阅读 · 2021年10月11日

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Arxiv

0+阅读 · 2021年10月7日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

《战术突击工具包：军队的“边缘”操作系统》报告

《认知战的历史视角：从冷战心理战行动到AI驱动的信息战》最新报告

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

相关论文

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Arxiv

0+阅读 · 2021年10月11日

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Arxiv

0+阅读 · 2021年10月7日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员