Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each time-step prediction through an attention mechanism. However, the target-side context is solely based on the sequence model which, in practice, is prone to a recency bias and lacks the ability to capture effectively non-sequential dependencies among words. To address this limitation, we propose a target-side-attentive residual recurrent network for decoding, where attention over previous words contributes directly to the prediction of the next word. The residual learning facilitates the flow of information from the distant past and is able to emphasize any of the previously translated words, hence it gains access to a wider context. The proposed model outperforms a neural MT baseline as well as a memory and self-attention network on three language pairs. The analysis of the attention learned by the decoder confirms that it emphasizes a wider context, and that it captures syntactic-like structures.
翻译:注意顺序到顺序网络在机器翻译方面取得了显著的成绩,其有效性的一个原因是它们能够通过注意机制在每次时间步骤预测时捕捉到相关的源方面背景信息,然而,目标方面的情况完全基于顺序模型,而这种模型实际上容易产生反应偏差,缺乏有效捕捉文字之间非顺序依赖性的能力。为了解决这一限制,我们提议建立一个目标-高度偏向剩余重复网络进行解码,注意前几个词直接有助于预测下一个词。剩余学习有助于遥远的过去的信息流动,能够强调以前翻译的任何文字,因此可以进入更广泛的背景。拟议的模型超越了神经MT基线以及三个语言配对的记忆和自留网络。对解码器所了解的注意的分析证实它强调一个更广泛的背景,它捕捉了类似合成的结构。