神经传感器语音识别 (Anchored Speech Recognition with Neural Transducers) - 专知论文

会员服务 ·

0

anchor · 语音识别 · INFORMS · Performer · 模型评估 ·

2022 年 10 月 20 日

Anchored Speech Recognition with Neural Transducers

翻译：神经传感器语音识别

Desh Raj,Junteng Jia,Jay Mahadeokar,Chunyang Wu,Niko Moritz,Xiaohui Zhang,Ozlem Kalinli

from arxiv, Submitted to IEEE ICASSP 2023

Neural transducers have gained popularity in production ASR systems, achieving human level recognition accuracy on standard benchmark datasets. However, their performance significantly degrades in the presence of crosstalks, especially when the background speech/noise is non-negligible as compared to the primary speech (i.e. low signal-to-noise ratio). Anchored speech recognition refers to a class of methods that use information from an anchor segment (e.g., wake-words) to recognize device-directed speech while ignoring interfering background speech/noise. In this paper, we investigate anchored speech recognition in the context of neural transducers. We use a tiny auxiliary network to extract context information from the anchor segment, and explore encoder biasing and joiner gating to guide the transducer towards the target speech. Moreover, to improve the robustness of context embedding extraction, we propose auxiliary training objectives to disentagle lexical content from speaking style. Our proposed methods are evaluated on synthetic LibriSpeech-based mixtures, where they improve word error rates by up to 36% compared to a background augmentation baseline.

翻译：神经感应器在制作 ASR 系统中越来越受欢迎,在标准基准数据集中实现了人的水平识别准确性。然而,它们的性能在交会中显著下降,特别是当背景演讲/噪音与主要演讲(即信号对噪音比率低)相比不易忽略时,尤其当背景演讲/噪音与主要演讲(即信号对噪音比率低)相比不显眼时。预言识别是指使用锚段(例如警醒词)信息识别设备引导的言语,而忽视干扰背景演讲/噪音的某类方法。在本文中,我们调查了神经感应器中嵌入的言语识别。我们使用一个微小的辅助网络从锁定部分提取背景信息,并探索编码器偏差和连接器指导转导器走向目标演讲。此外,为了提高嵌入的语系的稳健性,我们提出了辅助培训目标,以便从语音风格中分离出词汇内容。我们提出的方法在合成的LiSpeech 混合物上进行了评估,其中将文字错误率提高至36%,与背景放大基线。

0

相关内容

anchor

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于NF-κB-BMP信号通路探讨黄芩苷对肺动脉高压血管重构的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多价糖树状分子结构的小分子肝脏ASGPr探针的设计与制备

国家自然科学基金

0+阅读 · 2013年12月31日

CD147参与AR调控雄激素非依赖性前列腺癌的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于植物光学模型的光合色素多指数协同反演方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

文冠果较长静止期合子的发育和分裂研究

国家自然科学基金

0+阅读 · 2012年12月31日

以蛋白酶体为靶点的联苄化合物抗激素非依赖性前列腺癌作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition

Arxiv

0+阅读 · 2022年12月5日

Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition

Arxiv

0+阅读 · 2022年12月5日

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Arxiv

0+阅读 · 2022年12月5日

Cross-Modal Mutual Learning for Cued Speech Recognition

Arxiv

0+阅读 · 2022年12月2日

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年12月2日

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Arxiv

0+阅读 · 2022年12月1日

Part-based Face Recognition with Vision Transformers

Arxiv

0+阅读 · 2022年11月30日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition

Arxiv

0+阅读 · 2022年12月5日

Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition

Arxiv

0+阅读 · 2022年12月5日

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Arxiv

0+阅读 · 2022年12月5日

Cross-Modal Mutual Learning for Cued Speech Recognition

Arxiv

0+阅读 · 2022年12月2日

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年12月2日

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Arxiv

0+阅读 · 2022年12月1日

Part-based Face Recognition with Vision Transformers

Arxiv

0+阅读 · 2022年11月30日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于NF-κB-BMP信号通路探讨黄芩苷对肺动脉高压血管重构的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多价糖树状分子结构的小分子肝脏ASGPr探针的设计与制备

国家自然科学基金

0+阅读 · 2013年12月31日

CD147参与AR调控雄激素非依赖性前列腺癌的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于植物光学模型的光合色素多指数协同反演方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

文冠果较长静止期合子的发育和分裂研究

国家自然科学基金

0+阅读 · 2012年12月31日

以蛋白酶体为靶点的联苄化合物抗激素非依赖性前列腺癌作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员