ASR 精确度优化的前导型波形主神经声频取消器 (A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy) - 专知论文

会员服务 ·

0

语音识别 · 优化器 · 模型评估 · MoDELS · echo回声（移动应用） ·

2022 年 5 月 6 日

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

翻译：ASR 精确度优化的前导型波形主神经声频取消器

Sankaran Panchapagesan,Arun Narayanan,Turaj Zakizadeh Shabestary,Shuai Shao,Nathan Howard,Alex Park,James Walker,Alexander Gruenstein

from arxiv, Submitted to Interspeech 2022

Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio. Previous work has shown that a neural AEC model operating on log-mel spectral features (denoted "logmel" hereafter) can greatly improve Automatic Speech Recognition (ASR) accuracy when optimized with an auxiliary loss utilizing a pre-trained ASR model encoder. In this paper, we develop a conformer-based waveform-domain neural AEC model inspired by the "TasNet" architecture. The model is trained by jointly optimizing Negative Scale-Invariant SNR (SISNR) and ASR losses on a large speech dataset. On a realistic rerecorded test set, we find that cascading a linear adaptive AEC and a waveform-domain neural AEC is very effective, giving 56-59% word error rate (WER) reduction over the linear AEC alone. On this test set, the 1.6M parameter waveform-domain neural AEC also improves over a larger 6.5M parameter logmel-domain neural AEC model by 20-29% in easy to moderate conditions. By operating on smaller frames, the waveform neural model is able to perform better at smaller sizes and is better suited for applications where memory is limited.

翻译：声频取消( AEC) 是准确识别正在播放音频的智能扬声器( AEC) 的询问的关键。先前的工作已经表明, 运行于日- 熔光谱特征( 注意“ logmel ” ) 的神经 AEC 模型可以极大地提高自动语音识别( ASR) 精确度, 如果使用经过预先训练的 ASR 模型编码器进行辅助损失优化, 使用辅助损失优化自动语音识别( ASR ) 。在本文中, 我们开发了一个由“ 塔斯网” 架构所启发的基于符合的波形- 波形- 内线性神经EC 模型。该模型也通过在大型语音数据集上联合优化负浮标- 内空 SNIR ( SISNR) 和 ASR 损失来培训。在现实的重录测试集中, 我们发现, 将线性适应 AEC 调整 AEC 和波形- domaineal AEC (W) 校准型模型在20- 至 29 的较小型的中度模型上, 能够以更小型的中度运行更小的内装, 在20- 较小型的模型上, 较小型的内建更小的内, 较容易地进行更小的内装的内装为最小的内容度应用。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

空间表征发展中的感知觉动作整合机制

国家自然科学基金

1+阅读 · 2013年12月31日

流域极端水文事件对气候变化的多维响应及预估研究

国家自然科学基金

0+阅读 · 2013年12月31日

半量子计算模型与密码通信中的若干问题

国家自然科学基金

1+阅读 · 2012年12月31日

浅埋深煤层煤矿地表生态影响预测方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于布里渊光纤传感线圈的钢筋混凝土结构锈蚀监测与评估

国家自然科学基金

0+阅读 · 2012年12月31日

内皮素-1受体阻断剂降低慢性间歇性低氧诱发的大鼠高血压的神经机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

FRP筋钢纤维高强混凝土梁受弯性能与设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

非对易空间和非对易相空间中的量子物理

国家自然科学基金

0+阅读 · 2009年12月31日

电-磁-波梯度智能材料介质力学表征、磁电性质和物理波耦合机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

印刷图像颜色信息的高保真传输与再现

国家自然科学基金

0+阅读 · 2009年12月31日

Training Your Sparse Neural Network Better with Any Mask

Arxiv

0+阅读 · 2022年6月28日

A learning-based projection method for model order reduction of transport problems

Arxiv

0+阅读 · 2022年6月28日

Zero Stability Well Predicts Performance of Convolutional Neural Networks

Arxiv

0+阅读 · 2022年6月27日

Extended U-Net for Speaker Verification in Noisy Environments

Arxiv

0+阅读 · 2022年6月27日

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年6月26日

Design and Analysis of Robust Resilient Diffusion over Multi-Task Networks Against Byzantine Attacks

Arxiv

0+阅读 · 2022年6月25日

On Attack-Resilient Service Placement and Availability in Edge-enabled IoV Networks

Arxiv

0+阅读 · 2022年6月25日

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

Arxiv

0+阅读 · 2022年6月25日

QReg: On Regularization Effects of Quantization

Arxiv

0+阅读 · 2022年6月24日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

VIP会员

文章信息

相关主题

echo回声（移动应用）

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Training Your Sparse Neural Network Better with Any Mask

Arxiv

0+阅读 · 2022年6月28日

A learning-based projection method for model order reduction of transport problems

Arxiv

0+阅读 · 2022年6月28日

Zero Stability Well Predicts Performance of Convolutional Neural Networks

Arxiv

0+阅读 · 2022年6月27日

Extended U-Net for Speaker Verification in Noisy Environments

Arxiv

0+阅读 · 2022年6月27日

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年6月26日

Design and Analysis of Robust Resilient Diffusion over Multi-Task Networks Against Byzantine Attacks

Arxiv

0+阅读 · 2022年6月25日

On Attack-Resilient Service Placement and Availability in Edge-enabled IoV Networks

Arxiv

0+阅读 · 2022年6月25日

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

Arxiv

0+阅读 · 2022年6月25日

QReg: On Regularization Effects of Quantization

Arxiv

0+阅读 · 2022年6月24日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

相关基金

空间表征发展中的感知觉动作整合机制

国家自然科学基金

1+阅读 · 2013年12月31日

流域极端水文事件对气候变化的多维响应及预估研究

国家自然科学基金

0+阅读 · 2013年12月31日

半量子计算模型与密码通信中的若干问题

国家自然科学基金

1+阅读 · 2012年12月31日

浅埋深煤层煤矿地表生态影响预测方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于布里渊光纤传感线圈的钢筋混凝土结构锈蚀监测与评估

国家自然科学基金

0+阅读 · 2012年12月31日

内皮素-1受体阻断剂降低慢性间歇性低氧诱发的大鼠高血压的神经机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

FRP筋钢纤维高强混凝土梁受弯性能与设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

非对易空间和非对易相空间中的量子物理

国家自然科学基金

0+阅读 · 2009年12月31日

电-磁-波梯度智能材料介质力学表征、磁电性质和物理波耦合机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

印刷图像颜色信息的高保真传输与再现

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员