以终端对终端、模块系统和人 (Arabic Speech Recognition by End-to-End, Modular Systems and Human) - 专知论文

会员服务 ·

0

Performer · 语音识别 · 端到端 · Better · 自动语音识别 ·

2021 年 1 月 21 日

Arabic Speech Recognition by End-to-End, Modular Systems and Human

翻译：以终端对终端、模块系统和人

Amir Hussein,Shinji Watanabe,Ahmed Ali

Recent advances in automatic speech recognition (ASR) have achieved accuracy levels comparable to human transcribers, which led researchers to debate if the machine has reached human performance. Previous work focused on the English language and modular hidden Markov model-deep neural network (HMM-DNN) systems. In this paper, we perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM-DNN ASR, and human speech recognition (HSR) on the Arabic language and its dialects. For the HSR, we evaluate linguist performance and lay-native speaker performance on a new dataset collected as a part of this study. For ASR the end-to-end work led to 12.5%, 27.5%, 33.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively. Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.6% on average.

翻译：在自动语音识别(ASR)方面的最新进展达到了与人类传译器相似的准确度,这使得研究人员在机器达到人类性能时可以进行辩论。以前的工作重点是英语和模块隐藏的Markov 模型深神经网络(HMM-DNNN)系统。在本文中,我们对端到端变压器ASR、模块HMM-DNN AS(HSR)和人文语音识别(HSR)的阿拉伯语及其方言进行了全面的衡量标准。在《HSR》中,我们评估了作为本研究一部分收集的新数据集的语言性能和外形语语语音表现。对于ASR来说,端到端工作导致12.5%、27.5%、33.8%的WER;分别是MGB2、MGB3和MGB5挑战的一个新的性能里程碑。我们的结果表明,阿拉伯语的人类性能仍然大大高于机器,平均3.6%的绝对WER差距。

0

相关内容

Performer

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

专知会员服务

26+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Interspeech 2019 回顾 | 从顶会看语音技术的发展趋势

Interspeech 2019 回顾 | 从顶会看语音技术的发展趋势

DataFunTalk

11+阅读 · 2020年3月10日

Interspeech 2019 | 从顶会看语音技术的发展趋势

Interspeech 2019 | 从顶会看语音技术的发展趋势

AI科技评论

16+阅读 · 2019年9月19日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Adversarial Driving: Attacking End-to-End Autonomous Driving Systems

Arxiv

1+阅读 · 2021年3月16日

OkwuGbé: End-to-End Speech Recognition for Fon and Igbo

Arxiv

0+阅读 · 2021年3月13日

Streaming end-to-end multi-talker speech recognition

Arxiv

0+阅读 · 2021年3月12日

Automatic Romanization of Arabic Bibliographic Records

Arxiv

0+阅读 · 2021年3月12日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

Arxiv

4+阅读 · 2019年9月13日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

自动语音识别

相关VIP内容

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

专知会员服务

26+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《以任务为中心的建模未来：将集成数字成熟度路径与用户故事框架融入任务工程》最新文献

《人机协作集成模型中的不确定性捕获》博士论文

运用不可解释人工智能进行军事决策

《以军铁剑战争中的战场决策》最新报告

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Interspeech 2019 回顾 | 从顶会看语音技术的发展趋势

Interspeech 2019 回顾 | 从顶会看语音技术的发展趋势

DataFunTalk

11+阅读 · 2020年3月10日

Interspeech 2019 | 从顶会看语音技术的发展趋势

Interspeech 2019 | 从顶会看语音技术的发展趋势

AI科技评论

16+阅读 · 2019年9月19日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Adversarial Driving: Attacking End-to-End Autonomous Driving Systems

Arxiv

1+阅读 · 2021年3月16日

OkwuGbé: End-to-End Speech Recognition for Fon and Igbo

Arxiv

0+阅读 · 2021年3月13日

Streaming end-to-end multi-talker speech recognition

Arxiv

0+阅读 · 2021年3月12日

Automatic Romanization of Arabic Bibliographic Records

Arxiv

0+阅读 · 2021年3月12日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

Arxiv

4+阅读 · 2019年9月13日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员