学习机器翻译的深变换模型 (Learning Deep Transformer Models for Machine Translation)

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English- German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.

翻译：最近的机器翻译评估中,变异器是最新最先进的模型。两种研究都有望改进这类模型:首先使用宽网络(a.k.a.变异器-Big),并且已经成为发展变异器系统的实际标准,而其他则使用更深的语言代表,但又面临深层次网络学习产生的困难。在这里,我们继续研究后者。我们声称,真正深层变异器模型可以超过变异器-大对口,1) 适当使用层正常化,2) 是将前层合并到下一层的新方法。WMT'16英语、NIST OpenMT'12中文-英语和更大的WMT'18中文-英语任务,我们的深层系统(30/25级编码器)比浅层变异器-Big/Base基线(6层编码)高出0.4-2.4 BLEU点。作为另一个红利,深层模型的规模小1.6X,培训速度比变异器- Big要快3x。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日