利用反恐委员会辅助目标改进大规模多种语文的ASR</s> (Improving Massively Multilingual ASR With Auxiliary CTC Objectives)

Multilingual Automatic Speech Recognition (ASR) models have extended the usability of speech technologies to a wide variety of languages. With how many languages these models have to handle, however, a key to understanding their imbalanced performance across different languages is to examine if the model actually knows which language it should transcribe. In this paper, we introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark, by conditioning the entire model on language identity (LID). We investigate techniques inspired from recent Connectionist Temporal Classification (CTC) studies to help the model handle the large number of languages, conditioning on the LID predictions of auxiliary tasks. Our experimental results demonstrate the effectiveness of our technique over standard CTC/Attention-based hybrid models. Furthermore, our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER. Trained models and reproducible recipes are available at https://github.com/espnet/espnet/tree/master/egs2/fleurs/asr1 .

翻译：多语言自动语音识别(ASR)模式将语音技术的可用性扩大到了多种语言,然而,由于这些模式需要处理多少种语言,理解其在不同语言之间不均衡性能的关键是检查模型是否实际知道它应该转录哪种语言。在本文中,我们介绍我们改进FLEURS的工作,FLEURS是一个102种语言开放的ASR基准,对整个语言身份模式(LID)进行调整。我们调查了最近对连接时间分类(CTC)研究的启发技术,以帮助模型处理大量语言,以LID对辅助任务的预测为条件。我们的实验结果表明,我们的技术比标准的CTC/Atvention-基于混合模式有效。此外,我们利用与Conforent结构的自我监督模式,利用相对28.4%的CER改进了FLEURS先前工作的结果。在https://github.com/espnet/espnet/stree/master/ge2/friurs/asr1中,我们最先进的系统。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/