The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computational complexity as documents get longer. To address such problems, we introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. The memory unit is recurrently updated by acquiring information from sentences, and passing the aggregated knowledge back to subsequent sentence states. We follow a two-stage training strategy, in which the model is first trained at the sentence level and then finetuned for document-level translation. We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0.91 s-BLEU over the sentence-level baseline. We also achieve state-of-the-art results on TED and News, outperforming the previous work by 0.36 s-BLEU and 1.49 d-BLEU on average.
翻译:变换器结构在机器翻译方面已取得重大进展。然而,大多数研究仅侧重于判决一级的翻译,而没有考虑到文件中的背景依赖性,导致文件层面的一致性不足。最近的一些研究试图通过引入额外的上下文编码器或翻译多句甚至整个文件来缓解这一问题。这类方法可能会失去目标方的信息,或者随着文件的延长而增加计算复杂性。为解决这些问题,我们向香草变换器引入了一个经常性的记忆单元,支持判决与前一种内容之间的信息交流。记忆单元经常更新,从判决中获取信息,并将综合知识传回以后的句子状态。我们遵循了两阶段培训战略,即该模型首先在句子一级接受培训,然后进行文件层面翻译的微调。我们在文件级机器翻译的三个流行数据集上进行了实验,我们的模型比句级基线平均改进了0.91 s-BELU。我们还在TED和新闻领域取得了最新的成果,比前一个工作平均改进了0.36 S-GLEU和1.49 d-GLE。