End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied leveraging upon language modeling. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in terms of word-error-rate relative (WERR).
翻译:终端到终端自动语音识别系统(ASR)由于其相对建筑简洁和竞争性性能而越来越受欢迎。然而,尽管这些系统的平均准确性可能很高,但稀有内容字的性能往往落后于混合的ASR系统。为解决这一问题,通常在语言建模方面运用二通路重新校准。在本文中,我们提出一个具有多任务学习的第二通路系统,利用语义目标(如意向和时间档预测)来改进语音识别性能。我们表明,我们经过这些额外任务培训的重新校准模型比仅接受语言建模任务培训的基线重新定位模型(仅接受语言建模任务培训的模型)要强1.4%,用词词词词典(WERR)设定的稀有字典测试(WERR)要高出2.6%。