End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition. In this domain there is little research to analyse internal representation dependencies and their relationship to modelling approaches. This paper investigates cross-domain language model dependencies within transformer architectures using SVCCA and uses these insights to exploit modelling approaches. It was found that specific neural representations within the transformer layers exhibit correlated behaviour which impacts recognition performance. Altogether, this work provides analysis of the modelling approaches affecting contextual dependencies and ASR performance, and can be used to create or adapt better performing End-to-End ASR models and also for downstream tasks.
翻译:终端到终端自动语音识别模式旨在学习通用的语音表达方式,以履行承认;在这一领域,几乎没有研究分析内部代表依赖性及其与建模方法的关系;本文件调查了使用SVCCA的变压器结构内部的跨主题语言模式依赖性,并利用这些洞察力利用建模方法;发现变压器层中的具体神经表现方式显示出影响承认绩效的关联行为;总体而言,这项工作对影响背景依赖性和ASR绩效的建模方法进行了分析,可用于创建或更好地实施终端到终端的ASR模型,并用于下游任务。