Recently, several studies reported that dot-product selfattention (SA) may not be indispensable to the state-of-theart Transformer models. Motivated by the fact that dense synthesizer attention (DSA), which dispenses with dot products and pairwise interactions, achieved competitive results in many language processing tasks, in this paper, we first propose a DSA-based speech recognition, as an alternative to SA. To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition. Finally, we combine LDSA with SA to extract the local and global information simultaneously. Experimental results on the Ai-shell1 Mandarine speech recognition corpus show that the proposed LDSA-Transformer achieves a character error rate (CER) of 6.49%, which is slightly better than that of the SA-Transformer. Meanwhile, the LDSA-Transformer requires less computation than the SATransformer. The proposed combination method not only achieves a CER of 6.18%, which significantly outperforms the SA-Transformer, but also has roughly the same number of parameters and computational complexity as the latter. The implementation of the multi-head LDSA is available at https://github.com/mlxu995/multihead-LDSA.
翻译:最近,一些研究报告说,对最新科技变异器模型来说,对点产品自留(SA)可能不是不可或缺的,由于密集合成器的注意(DSA)中含有点产品和对称互动,因此在许多语言处理任务中取得了竞争性结果,本文首先提出以DSA为基础的语音识别,作为SA的替代。为降低计算复杂性和改进性能,我们进一步建议当地DSA(LSA)将DSA的注意范围限制在目前语音识别中心框架的本地范围。最后,我们将LDSA与SA结合起来,以同时提取当地和全球信息。Ai-shell1 Mandarine语音识别系统的实验结果显示,拟议的LDSA-Transorent实现了6.49%的性差错率(CER),比SA-Transerforent略高一点。与此同时,LDSA-Transer需要比SA更低的计算。拟议的组合方法不仅达到6.18%的CER,而且远超过SA/MATRA的复杂度。