Transformer-based models have recently made significant achievements in the application of end-to-end (E2E) automatic speech recognition (ASR). It is possible to deploy the E2E ASR system on smart devices with the help of Transformer-based models. While these models still have the disadvantage of requiring a large number of model parameters. To overcome the drawback of universal Transformer models for the application of ASR on edge devices, we propose a solution that can reuse the block in Transformer models for the occasion of the small footprint ASR system, which meets the objective of accommodating resource limitations without compromising recognition accuracy. Specifically, we design a novel block-reusing strategy for speech Transformer (BRST) to enhance the effectiveness of parameters and propose an adapter module (ADM) that can produce a compact and adaptable model with only a few additional trainable parameters accompanying each reusing block. We conducted an experiment with the proposed method on the public AISHELL-1 corpus, and the results show that the proposed approach achieves the character error rate (CER) of 9.3%/6.63% with only 7.6M/8.3M parameters without and with the ADM, respectively. In addition, we also make a deeper analysis to show the effect of ADM in the general block-reusing method.
翻译:基于Transformer的模型在最近的端到端(E2E)自动语音识别(ASR)应用中取得了显著的成就。 借助Transformer-based模型的帮助,可以在智能设备上部署E2E ASR系统。 这些模型仍然具有需要大量模型参数的缺点。为了克服通用Transformer模型在边缘设备上应用ASR的缺点,我们提出了一种解决方案,可以重复使用Transformer模型中的块,为小型ASR系统提供资源限制的解决方案,以实现适应性强的目标而不会影响识别精度。具体而言,我们为语音Transformer(BRST)设计了一种新的块重用策略,以增强参数的有效性,并提出了适配器模块(ADM),该模块可以产生具有自适应功能和紧凑性的模型,每个重用块伴随着只有少量的可训练参数 。 我们在公共的AISHELL-1语料库上使用所提出的方法进行实验,结果表明,所提出的方法在没有ADM和ADM的情况下分别实现了7.6M / 8.3M参数的字符错误率(CER)分别为9.3%/ 6.63%。另外,我们还进行了更深入的分析,以显示ADM在通用块重用方法中的效果。