Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge. In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. As large-scale document-level parallel corpora are usually not available, we introduce a two-step training method to take full advantage of abundant sentence-level parallel corpora and limited document-level parallel corpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach improves over Transformer significantly.
翻译:尽管变换器翻译模型(Vaswani等人,2017年)在各种翻译任务中取得了最先进的业绩,但如何使用文件级环境处理变换器有问题的谈话现象仍是一个挑战。 在这项工作中,我们扩展了变换器模型,增加了一个新的环境编码器,以代表文件级环境,然后将其纳入原始编码器和解码器。由于大规模文件级平行子公司通常无法使用,我们引入了两步培训方法,以充分利用大量的判刑级平行子公司和有限的文件级平行子公司。关于NIST中文和英文数据集以及IWSLT法文和英文数据集的实验表明,我们的方法大大超过变换器。