The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT'14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.
翻译:在过去的一年中,机器翻译的顺序到顺序(seq2seq)建模取得了迅速的进展。经典的RNN对MT采用的方法首先表现为革命后继2seq模型,后来的变异模型则表现为较新的变异模型。这些新方法中,每一种都包括一个基本架构,并辅之以一套原则上适用于其他后继2seq结构的模型和培训技术。在本文中,我们以两种方式拆分了新架构及其配套技术。首先,我们确定了几个关键的建模和培训技术,并将其应用到RNNN,产生了一个新的RNMT+模型,它超越了WMT'14英文对法文、英文对德文基准的所有三个基本架构。第二,我们分析了每个基本后继架构的特性,并设计了新的混合结构,目的是将两者的优势结合起来。我们的混合模型得到了进一步的改进,在两个基准数据集上都表现了RNMT+模型。