In recent years, a great deal of attention has been paid to the Transformer network for speech recognition tasks due to its excellent model performance. However, the Transformer network always involves heavy computation and large number of parameters, causing serious deployment problems in devices with limited computation sources or storage memory. In this paper, a new lightweight model called Sim-T has been proposed to expand the generality of the Transformer model. Under the help of the newly developed multiplexing technique, the Sim-T can efficiently compress the model with negligible sacrifice on its performance. To be more precise, the proposed technique includes two parts, that are, module weight multiplexing and attention score multiplexing. Moreover, a novel decoder structure has been proposed to facilitate the attention score multiplexing. Extensive experiments have been conducted to validate the effectiveness of Sim-T. In Aishell-1 dataset, when the proposed Sim-T is 48% parameter less than the baseline Transformer, 0.4% CER improvement can be obtained. Alternatively, 69% parameter reduction can be achieved if the Sim-T gives the same performance as the baseline Transformer. With regard to the HKUST and WSJ eval92 datasets, CER and WER will be improved by 0.3% and 0.2%, respectively, when parameters in Sim-T are 40% less than the baseline Transformer.
翻译:近年来,Transformer网络在语音识别任务中的出色性能备受关注。然而,Transformer网络通常需要大量计算和参数,导致在计算资源或存储内存有限的设备上存在严重的部署问题。因此,在本文中,我们提出了一个名为Sim-T的新型轻量级模型,以扩展Transformer模型的通用性。借助新开发的多路复用技术,Sim-T可以高效地压缩模型,并在性能上几乎不会做出任何牺牲。具体而言,所提出的技术包括模块权重多路复用和注意力分数多路复用两部分。此外,为促进注意力分数多路复用,还提出了一种新的解码器结构。我们进行了广泛的实验,以验证Sim-T的有效性。在Aishell-1数据集中,当提出的Sim-T的参数比基线Transformer低48%时,可以获得0.4%的CER改进。另外,如果Sim-T的性能与基线Transformer相同,则可以实现69%的参数减少。对于HKUST和WSJ eval92数据集,当Sim-T的参数比基线Transformer低40%时,CER和WER分别提高了0.3%和0.2%。