Despite the rapid progress of end-to-end (E2E) automatic speech recognition (ASR), it has been shown that incorporating external language models (LMs) into the decoding can further improve the recognition performance of E2E ASR systems. To align with the modeling units adopted in E2E ASR systems, subword-level (e.g., characters, BPE) LMs are usually used to cooperate with current E2E ASR systems. However, the use of subword-level LMs will ignore the word-level information, which may limit the strength of the external LMs in E2E ASR. Although several methods have been proposed to incorporate word-level external LMs in E2E ASR, these methods are mainly designed for languages with clear word boundaries such as English and cannot be directly applied to languages like Mandarin, in which each character sequence can have multiple corresponding word sequences. To this end, we propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences for each partial hypothesis. Then, the LM score of the hypothesis is obtained by intersecting the generated lattice with an external word N-gram LM. The proposed method is examined on both Attention-based Encoder-Decoder (AED) and Neural Transducer (NT) frameworks. Experiments suggest that our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM. We achieve state-of-the-art results on both Aishell-1 (CER 4.18%) and Aishell-2 (CER 5.06%) datasets and reduce CER by 14.8% relatively on a 21K-hour Mandarin dataset.
翻译:尽管端对端自动语音识别(ASR)取得了快速进展,但已经表明,将外部语言模型(LMS)纳入解码系统可以进一步提高E2E ASR系统的识别性。为了与E2E ASR系统采用的模型单位保持一致,通常使用LMS子字级(例如字符、BPE)与当前的E2E ASR系统合作。然而,使用小字级LMM将忽略字级信息,这可能限制E2E ASR外部LM的强度。虽然提议采用若干方法将字级外部LMS(LMM)纳入E2E ASR系统的识别性能。这些方法主要针对英文等有明确字界的语文设计,无法直接应用于曼达林等语言,其中每个字符序列可以有多个对应的字序列。为此,我们提议一种新型解码算法,在天上构建一个字级Ltockeral laticle latical A-loral-lational-lation Aral-lational-lational-lational-lational-lational-lational-lation-lationAral-lation-lation-lation-lation-lation-lation-lation-lation-lation-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-lxxxx-l-l-l-l-lxxx-lx-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-