神经机器翻译可缩放变换器 (Scalable Transformers for Neural Machine Translation)

Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. However, the deployment of Transformer is challenging because different scenarios require models of different complexities and scales. Naively training multiple Transformers is redundant in terms of both computation and memory. In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. Each sub-Transformer can be easily obtained by cropping the parameters of the largest Transformer. A three-stage training scheme is proposed to tackle the difficulty of training the Scalable Transformers, which introduces additional supervisions from word-level and sequence-level self-distillation. Extensive experiments were conducted on WMT EN-De and En-Fr to validate our proposed Scalable Transformers.

翻译：神经机器翻译(NMT)广泛采用变压器,因为其容量巨大,并同时对序列生成进行了培训;然而,由于不同的情景需要不同复杂程度和规模的模型,因此变压器的部署具有挑战性,因为不同的情景需要不同的复杂程度和规模的模型;在计算和记忆方面,对多种变压器的培训是多余的;在本文中,我们提议了一个新的可缩放变压器,它自然包含不同规模的子转换器,并具有共同的参数;每个子变压器都可以通过绘制最大变压器的参数很容易地获得。提议了一个三阶段培训计划,以解决培训可缩放变压器的困难,该变压器从字级和顺序的自我蒸馏中引入了额外的监督。在WMT ENDE和 En-Fr 上进行了广泛的实验,以验证我们提议的变压变压器。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

专知会员服务

170+阅读 · 2020年5月10日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Google】无监督机器翻译，Unsupervised Machine Translation