中中文变换器对以中中文为主的有息序列到序列语音识别识别 (Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese)

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of \emph{$28.77\%$}, which is competitive to the state-of-the-art CER of $28.0\%$ by the joint CTC-attention based encoder-decoder network.

翻译：从顺序到顺序的注意模型最近显示了自动语音识别(ASR)任务方面非常有希望的结果,自动语音识别(ASR)任务将声学、发音和语言模型整合成单一神经网络。在这些模型中,完全依靠自己注意而不使用RNN或连动的基于序列到序列的基于关注的新模型变换器,完全依靠自控的新的序列模式,实现了一个新的单一模型,即神经机翻译(NMT)的BLEU。由于变换器的出色表现,我们将其扩展为语音,并集中关注于它,作为中译机中音序列到后后注意模型的基本结构。此外,我们调查基于符号的模型和基于内语的基于内语的变换器的线模式(CI-phoneme)之间的比较。此外,提议与变换器(NMMT)的贪婪的解码解码解码解码器,用于绘制CIC-话序列和对单词序列的顺序。在KKOST-RC-DE数据库上进行的实验显示基于S-R-R-rent Recard 的模型,这是基于S-rage-rental-rage-rage-rate-rmax-rmas basy-r) 和基于Sy-rence-ral-ration-rmais-rmaisal-rmad 的S-rmad-rmad 的S-rmax 的S-rmax。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日