多级示范单位的门到门口口口语语音识别认证 (Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition)

The choice of modeling units affects the performance of the acoustic modeling and plays an important role in automatic speech recognition (ASR). In mandarin scenarios, the Chinese characters represent meaning but are not directly related to the pronunciation. Thus only considering the writing of Chinese characters as modeling units is insufficient to capture speech features. In this paper, we present a novel method involves with multi-level modeling units, which integrates multi-level information for mandarin speech recognition. Specifically, the encoder block considers syllables as modeling units, and the decoder block deals with character modeling units. During inference, the input feature sequences are converted into syllable sequences by the encoder block and then converted into Chinese characters by the decoder block. This process is conducted by a unified end-to-end model without introducing additional conversion models. By introducing InterCE auxiliary task, our method achieves competitive results with CER of 4.1%/4.6% and 4.6%/5.2% on the widely used AISHELL-1 benchmark without a language model, using the Conformer and the Transformer backbones respectively.

翻译：建模单位的选择会影响声学模型的性能,在自动语音识别(ASR)中起到重要作用。在曼达林情景中,中文字符代表意义,但与发音没有直接关系。因此,仅将中文字符的写作作为建模单位不足以捕捉语音特征。在本文中,我们提出了一个新颖的方法,涉及多级建模单位,它结合了多级建模单位的信息,用于汉达林语音识别。具体地说,编码器块将交错器视为建模单位,而解码器块则涉及字符建模单位。在推断过程中,输入特征序列由编码块转换成可调音序列,然后由解码块转换成中文字符。这一过程由一个统一的端到端模型进行,而不引入额外的转换模型。通过引入InterCE的辅助任务,我们的方法取得了竞争性的结果,在广泛使用的AISELLL-1基准中,不使用语言模型,分别使用 Constor和变压骨骨。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/