End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied leveraging upon language modeling. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in terms of word-error-rate relative (WERR). Our best ASR system with multi-task LM shows 4.6% WERR deduction compared with RNN Transducer only ASR baseline for rare words recognition.
翻译:终端到终端自动语音识别系统(ASR)因其相对建筑简单和竞争性性能而越来越受欢迎。然而,尽管这些系统的平均精度可能很高,但稀有内容字的性能往往落后于混合的ASR系统。为解决这一问题,通常在语言建模方面运用二通路重新校准。在本文中,我们提出一个具有多任务学习的二通路系统,利用语义目标(如意向和时间档预测)来提高语音识别性能。我们显示,我们经过这些额外任务培训的重新定位模型比仅接受语言建模任务培训的基线重新定位模型要强,在一般测试中只接受1.4%,在以单词机速率相对(WERR)设定的稀有字数测试中为2.6%。我们最好的多任务LM(WERR)的亚速率系统比RNN Transdudererer器的稀有字识别基准要低4.6%。