State-of-the-art hybrid automatic speech recognition (ASR) system exploits deep neural network (DNN) based acoustic models (AM) trained with Lattice Free-Maximum Mutual Information (LF-MMI) criterion and n-gram language models. The AMs typically have millions of parameters and require significant parameter reduction to operate on embedded devices. The impact of parameter quantization on the overall word recognition performance is studied in this paper. Following approaches are presented: (i) AM trained in Kaldi framework with conventional factorized TDNN (TDNN-F) architecture, (ii) the TDNN AM built in Kaldi loaded into the PyTorch toolkit using a C++ wrapper for post-training quantization, (iii) quantization-aware training in PyTorch for Kaldi TDNN model, (iv) quantization-aware training in Kaldi. Results obtained on standard Librispeech setup provide an interesting overview of recognition accuracy w.r.t. applied quantization scheme.
翻译:以Lattice Free-Meximum 相互信息(LF-MMI)标准和n-gram语言模型培训的基于深神经网络的声学模型(AM)利用了以Lattice Free-Meximum 相互信息(LF-MMI)标准和n-gram语言模型培训的深神经网络(DNN),AM通常有数百万个参数,需要大量减少参数才能在嵌入装置上操作。本文研究了参数量化对全字识别性能的影响。介绍了以下方法:(一)在卡尔迪框架内培训的AM,具有传统的TDNNN(TDNN-F)因素结构;(二)在卡尔迪建造的TDNNAM,用C+ 包装器装进PyTorch工具包,使用C+ 包装器进行培训后四)在Kaldi TDNN模型的PyTorrch中进行定量-觉化培训,(四)在Kaldi进行量识别性培训。在标准Lirispeech设置上获得的结果,对应用的孔化办法的确认准确性作了有趣的概述。