Training neural machine translation (NMT) models in federated learning (FL) settings could be inefficient both computationally and communication-wise, due to the large size of translation engines as well as the multiple rounds of updates required to train clients and a central server. In this paper, we explore how to efficiently build NMT models in an FL setup by proposing a novel solution. In order to reduce the communication overhead, out of all neural layers we only exchange what we term "Controller" layers. Controllers are a small number of additional neural components connected to our pre-trained architectures. These new components are placed in between original layers. They act as liaisons to communicate with the central server and learn minimal information that is sufficient enough to update clients. We evaluated the performance of our models on five datasets from different domains to translate from German into English. We noted that the models equipped with Controllers preform on par with those trained in a central and non-FL setting. In addition, we observed a substantial reduction in the communication traffic of the FL pipeline, which is a direct consequence of using Controllers. Based on our experiments, Controller-based models are ~6 times less expensive than their other peers. This reduction is significantly important when we consider the number of parameters in large models and it becomes even more critical when such parameters need to be exchanged for multiple rounds in FL settings.
翻译:联邦学习(FL)环境中的神经机翻译培训模型(NMT)在计算和沟通方面都可能效率低下,因为翻译引擎规模庞大,培训客户和中央服务器需要多轮更新,因此在计算和沟通方面都是低效的。在本文中,我们探讨如何通过提出新的解决方案,在FL设置中高效地建立NMT模型。为了减少通信管理费用,我们在所有神经层中只能交换我们所说的“控制”层。主计长是连接我们预先培训的建筑的少数额外的神经元件。这些新元件被放在原来的两层之间。它们充当与中央服务器进行联络的联络,并学习足以更新客户的最低限度信息。我们评估了我们在不同领域五个数据集上建立NMT模型的绩效,以便从德语翻译成英语。我们注意到,在中央和非FL设置中,配备管理员的模型与受过训练的模型相比,其前期相当。此外,我们看到FL管道的通信流量大幅下降,这是使用管理员的直接结果。基于我们的实验,在大量的汇率中,CL的模型比其他重要参数要低得多。