Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.
翻译:交叉关注是神经机器翻译的一个重要组成部分,在以往的方法中,通过点产品关注始终可以实现。然而,点产品关注只考虑单词之间的双向关联,在处理长期判决时导致分散,忽视源邻居关系。 语言的启发,上述问题是由于忽视一种类型的交叉关注,即所谓的集中关注,侧重于几个核心词,然后围绕这些词展开。在这项工作中,我们运用高森混合模型(GOMM)来模拟交叉关注的集中。我们在三个数据集上进行的实验和分析表明,拟议方法超过了基线,大大改进了对齐质量、N-克精确度和长句翻译。