Neural network-based language models are commonly used in rescoring approaches to improve the quality of modern automatic speech recognition (ASR) systems. Most of the existing methods are computationally expensive since they use autoregressive language models. We propose a novel rescoring approach, which processes the entire lattice in a single call to the model. The key feature of our rescoring policy is a novel non-autoregressive Lattice Transformer Language Model (LT-LM). This model takes the whole lattice as an input and predicts a new language score for each arc. Additionally, we propose the artificial lattices generation approach to incorporate a large amount of text data in the LT-LM training process. Our single-shot rescoring performs orders of magnitude faster than other rescoring methods in our experiments. It is more than 300 times faster than pruned RNNLM lattice rescoring and N-best rescoring while slightly inferior in terms of WER.
翻译:以神经网络为基础的语言模型通常用于重新校准方法,以提高现代自动语音识别系统的质量。大多数现有方法由于使用自动递减语言模型,因此计算成本很高。我们建议采用新颖的重新校准方法,在对模型的单一调用中处理整层。我们重新校正政策的关键特征是新颖的非自动拉蒂变换语言模型(LT-LM)。这一模型将整个拉蒂作为输入,并预测每个弧的新语言分数。此外,我们提议人工拉蒂克生成方法,将大量文本数据纳入LT-LM培训过程。我们的单发重新校正比我们实验中的其他重新校正方法要快得多。它比小得多300倍,比小一点的RNNNLM Lattice重新校正和N-Best重新校正速度要快。