跨模式 ASR 错误更正和资产否决处理后处理系统 (Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection) - 专知论文

会员服务 ·

0

语音识别 · 估计/估计量 · 自动语音识别 · 错误率 · 模型评估 ·

2022 年 1 月 10 日

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

翻译：跨模式 ASR 错误更正和资产否决处理后处理系统

Jing Du,Shiliang Pu,Qinbo Dong,Chao Jin,Xin Qi,Dian Gu,Ru Wu,Hongwei Zhou

from arxiv, submit to ICASSP2022, 5 pages, 3 figures

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we propose a cross-modal post-processing system for speech recognizers, which 1) fuses acoustic features and textual features from different modalities, 2) joints a confidence estimator and an error corrector in multi-task learning fashion and 3) unifies error correction and utterance rejection modules. Compared with single-modal or single-task models, our proposed system is proved to be more effective and efficient. Experiment result shows that our post-processing system leads to more than 10% relative reduction of character error rate (CER) for both single-speaker and multi-speaker speech on our industrial ASR system, with about 1.7ms latency for each token, which ensures that extra latency introduced by post-processing is acceptable in streaming speech recognition.

翻译：虽然现代自动语音识别系统可以取得高性能,但可能会产生错误,削弱读者的经验,损害下游任务。为了提高ASR假设的准确性和可靠性,我们提议为语音识别者建立一个跨模式后处理系统,该系统:(1) 结合不同模式的声学特征和文字特征,(2) 在多任务学习时将信任估计器和错误纠正器连接在一起,(3) 统一错误纠正和断语模块。与单一模式或单一任务模式相比,我们提议的系统被证明更有成效和效率更高。实验结果显示,我们的后处理系统导致我们的工业ASR系统单声器和多声器语言的性格错误率降低10%以上,每个符号约1.7米的悬浮度,这确保了后处理引入的超静脉冲识别可以接受。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

【因果人工智能系统】106页ppt，Causal AI for Systems

专知会员服务

97+阅读 · 2021年8月28日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于人脸表情、身体姿态和语音的多模态情感识别方法研究

国家自然科学基金

10+阅读 · 2015年12月31日

基于多模态情感识别的人机交流氛围场建模方法

国家自然科学基金

3+阅读 · 2013年12月31日

基于视频信号空时稀疏的压缩感知重构方法

国家自然科学基金

0+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

物联网服务资源管理与调度技术的研究

国家自然科学基金

3+阅读 · 2012年12月31日

基于分段条件随机场的连续语音识别技术

国家自然科学基金

1+阅读 · 2011年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

调肝降脂方对CYP7A1信号通路调控作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

CNTF激活的Ast与神经元间的对话交流在癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

多率含噪时滞非线性系统基于数据挖掘的自适应控制

国家自然科学基金

0+阅读 · 2009年12月31日

Optimal Feedback Control for Modeling Human-Computer Interaction

Arxiv

3+阅读 · 2022年4月20日

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Arxiv

0+阅读 · 2022年4月19日

MP2: A Momentum Contrast Approach for Recommendation with Pointwise and Pairwise Learning

Arxiv

0+阅读 · 2022年4月18日

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

Arxiv

0+阅读 · 2022年4月18日

Consecutive Decoding for Speech-to-text Translation

Arxiv

0+阅读 · 2022年4月15日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

估计/估计量

自动语音识别

相关VIP内容

【因果人工智能系统】106页ppt，Causal AI for Systems

专知会员服务

97+阅读 · 2021年8月28日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Optimal Feedback Control for Modeling Human-Computer Interaction

Arxiv

3+阅读 · 2022年4月20日

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Arxiv

0+阅读 · 2022年4月19日

MP2: A Momentum Contrast Approach for Recommendation with Pointwise and Pairwise Learning

Arxiv

0+阅读 · 2022年4月18日

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

Arxiv

0+阅读 · 2022年4月18日

Consecutive Decoding for Speech-to-text Translation

Arxiv

0+阅读 · 2022年4月15日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

基于人脸表情、身体姿态和语音的多模态情感识别方法研究

国家自然科学基金

10+阅读 · 2015年12月31日

基于多模态情感识别的人机交流氛围场建模方法

国家自然科学基金

3+阅读 · 2013年12月31日

基于视频信号空时稀疏的压缩感知重构方法

国家自然科学基金

0+阅读 · 2012年12月31日

火针疗法调控Wnt/ERK多信号途径对脊髓损伤后神经修复效应及机制

国家自然科学基金

0+阅读 · 2012年12月31日

物联网服务资源管理与调度技术的研究

国家自然科学基金

3+阅读 · 2012年12月31日

基于分段条件随机场的连续语音识别技术

国家自然科学基金

1+阅读 · 2011年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

调肝降脂方对CYP7A1信号通路调控作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

CNTF激活的Ast与神经元间的对话交流在癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

多率含噪时滞非线性系统基于数据挖掘的自适应控制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员