Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformer-based hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.
翻译:音量调整对于建立稳健的自动语音识别系统十分重要。 在这项工作中,我们根据地空方法,调查了在总机300小时数据集上基于符合标准的声学模型(AM)的音调模型(AM)的演讲者适应性培训的各种方法。我们提出了一种方法,叫做“加权-Semple-Add”,它将加权音量信息矢量添加到配音器AM多头自省模块的输入中。我们用这种方法在SAT中,在赫伯500和赫伯501的CallHome部分的WER方面分别实现了3.5%和4.5%的相对改进。此外,我们除了以前的工作之外,还提出了基于符合标准的混合调音器的新型和竞争性培训配方。我们扩大并改进了这一配方,在300赫赫赫-500数据集上实现了11%的字速率相对改善。我们还通过将参数总数减少34%来提高这一配方的效率。