Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations. However, these models, predominantly based on transformers, are difficult to scale to long documents as their attention layers have quadratic complexity in the sequence length. Recent efforts on efficient attention improve scalability, but their effect on document translation remains unexplored. In this work, we investigate the efficacy of a recent linear attention model by Peng et al. (2021) on document translation and augment it with a sentential gate to promote a recency inductive bias. We evaluate the model on IWSLT 2015 and OpenSubtitles 2018 against the transformer, demonstrating substantially increased decoding speed on long sequences with similar or better BLEU scores. We show that sentential gating further improves translation quality on IWSLT.
翻译:文档级机器翻译利用文件级别之间的依赖性来生成更加一致和一致的翻译,然而,这些主要以变压器为基础的模型很难向长文件缩放,因为它们的注意层在序列长度上具有四重复杂度。最近关于高效关注的努力提高了可缩放性,但对文件翻译的影响仍未探索。在这项工作中,我们调查彭等人(2021年)最近关于文件翻译的线性关注模型的有效性,并增加一个有感应的门,以促使人们保持自动偏差。我们评估了关于IWSLT 2015 和 OpenSubitles 2018 的模型与变压器相比,显示长序列的解码速度大幅提高,而BLEU的分数相似或更好。我们显示,感应式引力进一步提高了IWSLT的翻译质量。