We propose a new architecture for adapting a sentence-level sequence-to-sequence transformer by incorporating multiple pretrained document context signals and assess the impact on translation performance of (1) different pretraining approaches for generating these signals, (2) the quantity of parallel data for which document context is available, and (3) conditioning on source, target, or source and target contexts. Experiments on the NIST Chinese-English, and IWSLT and WMT English-German tasks support four general conclusions: that using pretrained context representations markedly improves sample efficiency, that adequate parallel data resources are crucial for learning to use document context, that jointly conditioning on multiple context representations outperforms any single representation, and that source context is more valuable for translation performance than target side context. Our best multi-context model consistently outperforms the best existing context-aware transformers.
翻译:我们提出一个新的结构,以调整一个判决级序列到顺序变压器,办法是纳入多个经过预先培训的文件背景信号,并评估(1) 生成这些信号的不同培训前方法对翻译性能的影响,(2) 具备文件背景的平行数据数量,(3) 取决于源、目标或来源和目标背景。 对中英国家信息系统和WMT英语-德语任务进行的实验支持四项一般性结论:使用预先培训的背景表现明显提高了样本效率,充足的平行数据资源对于学习使用文件背景至关重要,对多个背景表现的共同条件优于任何单一代表形式,而这一源环境对于翻译性能比目标侧环境更有价值。 我们的最佳多文本模式始终超越了现有最佳的有环境意识变压器。