改进室内仿真响应估计以提升语音识别表现 (Towards Improved Room Impulse Response Estimation for Speech Recognition) - 专知论文

会员服务 ·

0

估计/估计量 · 语音识别 · Reverberation · state-of-the-art · 自动语音识别 ·

2023 年 3 月 19 日

Towards Improved Room Impulse Response Estimation for Speech Recognition

翻译：改进室内仿真响应估计以提升语音识别表现

Anton Ratnarajah,Ishwarya Ananthabhotla,Vamsi Krishna Ithapu,Pablo Hoffmann,Dinesh Manocha,Paul Calamia

from arxiv, Accepted at ICASSP 2023. More results are available at https://anton-jeran.github.io/S2IR/

We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 17\% on the energy decay relief and 22\% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9\% in word error rate).

翻译：我们在远场自动语音识别(ASR)的下游应用场景中，提出了一种盲目的室内仿真响应(RIR)估计系统的新方法。我们首先建立改进RIR估计和改进ASR性能之间的联系，作为评估神经RIR估计器的手段。然后，我们提出了一种基于生成对抗网络(GAN)架构的方法，用于从混响语音中编码RIR特征并构建RIR。该方法使用一种新型的能量衰减缓解损失来优化捕捉输入混响语音的基于能量的属性。我们表明，我们的模型在声学基准测试上（能量衰减缓解提高了17%，早期反射能量指标提高了22%），以及ASR评估任务中（单词错误率降低了6.9%）都优于最先进的基准线。

0

相关内容

估计/估计量

估计/估计量

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

调控马铃薯干旱胁迫响应相关转录因子的miRNA功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

高精度主动光场三维成像机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

低剂量苯暴露所致造血干细胞损伤的表观遗传机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

频率与空间相关噪声对大地电磁反演解释影响的量化研究

国家自然科学基金

0+阅读 · 2012年12月31日

Toward Auto-evaluation with Confidence-based Category Relation-aware Regression

Arxiv

0+阅读 · 2023年5月9日

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Arxiv

0+阅读 · 2023年5月8日

An Enhanced Sampling-Based Method With Modified Next-Best View Strategy For 2D Autonomous Robot Exploration

Arxiv

0+阅读 · 2023年5月8日

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Arxiv

0+阅读 · 2023年5月5日

PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism

Arxiv

0+阅读 · 2023年5月5日

VIP会员

文章信息

相关主题

估计/估计量

state-of-the-art

自动语音识别

相关VIP内容

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的事件抽取：方法、模态与未来展望的全面综述

美海军作战管理系统：变革战场空间的二十年

【MIT博士论文】以语言为中心的医学影像理解

俄罗斯“沙希德”/“天竺葵”攻击无人机

相关资讯

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

相关论文

Toward Auto-evaluation with Confidence-based Category Relation-aware Regression

Arxiv

0+阅读 · 2023年5月9日

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Arxiv

0+阅读 · 2023年5月8日

An Enhanced Sampling-Based Method With Modified Next-Best View Strategy For 2D Autonomous Robot Exploration

Arxiv

0+阅读 · 2023年5月8日

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Arxiv

0+阅读 · 2023年5月5日

PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism

Arxiv

0+阅读 · 2023年5月5日

相关基金

调控马铃薯干旱胁迫响应相关转录因子的miRNA功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

高精度主动光场三维成像机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

低剂量苯暴露所致造血干细胞损伤的表观遗传机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

频率与空间相关噪声对大地电磁反演解释影响的量化研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员