To protect user privacy and meet legal regulations, federated learning (FL) is attracting significant attention. Training neural machine translation (NMT) models with traditional FL algorithm (e.g., FedAvg) typically relies on multi-round model-based interactions. However, it is impractical and inefficient for machine translation tasks due to the vast communication overheads and heavy synchronization. In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework that, instead of multi-round model-based interactions, leverages one-round memorization-based interaction to share knowledge across different clients to build low-overhead privacy-preserving systems. The whole approach equips the public NMT model trained on large-scale accessible data with a $k$-nearest-neighbor ($$kNN) classifier and integrates the external datastore constructed by private text data in all clients to form the final FL model. A two-phase datastore encryption strategy is introduced to achieve privacy-preserving during this process. Extensive experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg, while maintaining promising performance in different FL settings.
翻译:为了保护用户隐私和遵守法律条例,联合会学习(FL)正在引起人们的极大关注。使用传统FL算法(例如FedAvg)培训神经机器翻译(NMT)模式的训练通常依赖多轮模型互动,然而,由于通信管理费用巨大和高度同步,机器翻译任务不切实际,效率低下。在本文中,我们提议建立一个新型的FedNNN(FedNN)近邻联合机器翻译框架,它不是多轮式模式互动,而是利用一回合的模拟互动,在不同客户之间分享知识,以建立低超载隐私保护系统。整个方法为接受大规模无障碍数据培训的公共NMT模型配备了成本为美元-远端邻居($kNNN)的分类器,并将所有客户中私人文本数据所建的外部数据储存器整合成FL模式。在此过程中引入了两阶段数据存储加密战略,以实现隐私保护。广泛的实验显示,FDNNND大大降低了与FDAvg的计算和通信成本,同时保持不同性。