With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications such as neural machine translation (NMT) from cloud to mobile devices such as smartphones. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task over the Transformer model, respectively. Compared with a strong baseline that also uses multiple branches, the proposed method is up to 1.6 times faster with the same number of parameters.
翻译:随着人工智能的迅速发展(AI),出现了将神经机翻译(NMT)等神经机翻译(NMT)等光学应用从云层转移到智能手机等移动装置的趋势。受有限的硬件资源和电池的制约,NMT系统在设备上的表现远不能令人满意。在有条件的计算激励下,我们提议改进具有动态多分层的在设备上安装NMT系统的性能。具体地说,我们设计了一个从层到层的动态多部门网络,在培训和推断期间只有一个分支被激活。由于并非所有分支在培训期间被激活,我们提议对每个分支进行共用的私人重新校准,以确保足够的培训。在几乎相同的计算成本下,我们的方法在WMT14英语-德语翻译任务上实现了高达1.7个BLEU点的改进,在变压器模型中WMT20中英语翻译任务上实现了1.8个BLEU点的改进。与使用多个分支的强基线相比,拟议方法的速度为1.6倍,参数数量相同。