Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies. It can be further enhanced with multi-headed attention mechanism by allowing the model to jointly attend to information from different representation subspaces at different positions (Vaswani et al., 2017). In this work, we propose a novel convolutional self-attention network (CSAN), which offers SAN the abilities to 1) capture neighboring dependencies, and 2) model the interaction between multiple attention heads. Experimental results on WMT14 English-to-German translation task demonstrate that the proposed approach outperforms both the strong Transformer baseline and other existing works on enhancing the locality of SAN. Comparing with previous work, our model does not introduce any new parameters.
翻译:自控网络(自控网络)最近因其完全平行的计算和在建模依赖性方面的灵活性而引起越来越多的兴趣,通过多头关注机制可以进一步加强这一网络,让该模式能够联合关注不同职位上不同代表分空间的信息(Vaswani等人,2017年)。在这项工作中,我们提议建立一个新的革命自控网络(CSAN),让SAN有能力1 捕捉相邻的依赖性,2 模拟多个关注头之间的互动。WMT14英文对德文翻译任务的实验结果表明,拟议方法优于强大的变换器基线和其他现有工程来提升自控网络的地理位置。 与以往的工作相比,我们的模式没有引入任何新的参数。