In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. Given this success, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. With this motivation, previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds. To this end, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI), into E2E ASR systems not only in the training stage but also in the decoding process. The proposed LF-MMI training and decoding methods show their effectiveness on two widely used E2E frameworks: Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Compared with MBR-based methods, the proposed LF-MMI method: maintains the consistency between training and decoding; eschews the on-the-fly decoding process; trains from randomly initialized models with superior training efficiency. Experiments suggest that the LF-MMI method outperforms its MBR counterparts and consistently leads to statistically significant performance improvements on various frameworks and datasets from 30 hours to 14.3k hours. The proposed method achieves state-of-the-art (SOTA) results on Aishell-1 (CER 4.10%) and Aishell-2 (CER 5.02%) datasets. Code is released.
翻译:在自动语音识别(ASR)研究中,歧视性标准在DNN-HMM系统中取得了优异的性能。鉴于这一成功,采用歧视性标准有望提高端对端(E2E)ASR系统的性能。有了这一动机,以前的工作已经将巴伊西亚最低风险(MBR, 歧视标准之一)引入了E2E ASR系统。然而,基于MBR方法的效力和效率受到影响:MBR标准仅用于系统培训,这造成了培训与解码之间的不匹配;基于MBR方法的在线解码进程导致需要预先培训模型,培训速度缓慢。为此,在这项工作中提出了新的算法,将另一个广泛使用的歧视性标准(MBRBR(L-MI, 一种歧视标准标准)引入E2E系统(LM-BRI),不仅在培训阶段,而且在解码过程中使用。拟议的LF-M-BI培训和解码方法显示其在两个广泛使用的 E2E-EFA框架上的有效性:SO-C-deal-Deal-Deal-Deal Aral-Trading Aration Aration Ax-de-LM-deal-de-de dal Ag-de drodustrism-de d dal thes thes thes the contrad the dal dislational the dislational the contra dald the dald the daldal the dism-d the dal the daldaldald thes thes)