Federated learning (FL) enables learning from decentralized privacy-sensitive data, with computations on raw data confined to take place at edge clients. This paper introduces mixed FL, which incorporates an additional loss term calculated at the coordinating server (while maintaining FL's private data restrictions). There are numerous benefits. For example, additional datacenter data can be leveraged to jointly learn from centralized (datacenter) and decentralized (federated) training data and better match an expected inference data distribution. Mixed FL also enables offloading some intensive computations (e.g., embedding regularization) to the server, greatly reducing communication and client computation load. For these and other mixed FL use cases, we present three algorithms: PARALLEL TRAINING, 1-WAY GRADIENT TRANSFER, and 2-WAY GRADIENT TRANSFER. We state convergence bounds for each, and give intuition on which are suited to particular mixed FL problems. Finally we perform extensive experiments on three tasks, demonstrating that mixed FL can blend training data to achieve an oracle's accuracy on an inference distribution, and can reduce communication and computation overhead by over 90%. Our experiments confirm theoretical predictions of how algorithms perform under different mixed FL problem settings.
翻译:联邦学习(FL) 能够从分散的隐私敏感数据中学习,而原始数据的计算仅限于边缘客户。本文介绍混合的FL,其中含有在协调服务器上计算的额外损失期(同时保持FL的私人数据限制),有许多好处。例如,可以利用更多的数据中心数据,从中央(数据中心)和分散的(联邦化的)培训数据中共同学习,并更好地匹配预期的推论数据分布。混合的FL还能够从服务器上卸载一些密集的计算(例如嵌入正规化),大大减少通信和客户计算负荷。对于这些和其他混合的FL使用案例,我们提出了三种算法:PARALEL 培训、1-WAAY GRADIENT TRANSL和2-WAY GRADIENTTRAFERT。我们说明每种趋同的界限,并给出适合特定混合的FL问题的直觉。我们最后对三项任务进行了广泛的实验,表明混合的FL培训数据可以实现推断分布的精确度,并可以减少我们不同理论性地测算法的90%以上。