End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model with trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in term of word-error-rate relative (WERR).
翻译:终端到终端自动语音识别系统(ASR)由于其相对建筑简单和竞争性性能而越来越受欢迎。然而,尽管这些系统的平均精确度可能很高,但稀有内容字的性能往往落后于混合的ASR系统。为解决这一问题,经常采用第二通重新校准。在本文件中,我们建议采用多任务学习的第二通系统,利用语义目标(如意向和时间档预测)来改进语音识别性能。我们表明,我们经过这些额外任务培训的重新组合模型比仅接受语言模拟任务培训的基线重新定位模型(仅接受一般测试培训的为1.4%,使用单词速率相对(WERR)的稀有字数测试的2.6% 。