多级示范单位的门到门口口口语语音识别认证 (Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition)

The choice of modeling units is crucial for automatic speech recognition (ASR) tasks. In mandarin scenarios, the Chinese characters represent meaning but are not directly related to the pronunciation. Thus only considering the writing of Chinese characters as modeling units is insufficient to capture speech features. In this paper, we present a novel method involves with multi-level modeling units, which integrates multi-level information for mandarin speech recognition. Specifically, the encoder block considers syllables as modeling units and the decoder block deals with character-level modeling units. To facilitate the incremental conversion from syllable features to character features, we design an auxiliary task that applies cross-entropy (CE) loss to intermediate decoder layers. During inference, the input feature sequences are converted into syllable sequences by the encoder block and then converted into Chinese characters by the decoder block. Experiments on the widely used AISHELL-1 corpus demonstrate that our method achieves promising results with CER of 4.1%/4.6% and 4.6%/5.2%, using the Conformer and the Transformer backbones respectively.

翻译：建模单位的选择对于自动语音识别( ASR) 任务至关重要。在曼达林情景中, 中国字符代表意义, 但与发音没有直接关系。因此, 仅将中国字符的写作作为建模单位不足以捕捉语音特征。在本文中, 我们提出了一个新颖的方法, 涉及多级建模单位, 包括多级建模单位, 将多级建模信息整合为汉达林语音识别。具体地说, 编码器块将可建模单位的音调视为可建模单位, 解码器块块则与字符级建模单位打交道。为了便于将可调频特性从可调的特性逐步转换为字符特征, 我们设计了一个辅助任务, 将跨倍增的损( CE) 值应用到中间解码层。在推断过程中, 输入的特性序列由编码器块转换为可调序, 然后由解码器块转换成中国字符。在广泛使用的 AISHELLL-1 上进行的实验表明, 我们的方法分别利用 Construsion 和变压骨骨骨骨骨骨骨骨骨, 获得4.%/ 4./4. 和 4.6% 4.6% 4.6% 和 4.6%/ 4.% 和 4.%/ 5. 和 4. 5/ 5. 5.2 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日