联邦动态粗缩培训:低电子计算、低通信、少交流、但学习得更好 (Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better)

Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices. Unfortunately, current deep networks remain not only too compute-heavy for inference and training on edge devices, but also too large for communicating updates over bandwidth-constrained networks. In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST) by which complex neural networks can be deployed and trained with substantially improved efficiency in both on-device computation and in-network communication. At the core of FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network. With this scheme, "two birds are killed with one stone:" instead of full models, each client performs efficient training of its own sparse networks, and only sparse networks are transmitted between devices and the cloud. Furthermore, our results reveal that the dynamic sparsity during FL training more flexibly accommodates local heterogeneity in FL agents than the fixed, shared sparse masks. Moreover, dynamic sparsity naturally introduces an "in-time self-ensembling effect" into the training dynamics and improves the FL performance even over dense training. In a realistic and challenging non i.i.d. FL setting, FedDST consistently outperforms competing algorithms in our experiments: for instance, at any fixed upload data cap on non-iid CIFAR-10, it gains an impressive accuracy advantage of 10% over FedAvgM when given the same upload data cap; the accuracy gap remains 3% even when FedAvgM is given 2x the upload data cap, further demonstrating efficacy of FedDST. Code is available at: https://github.com/bibikar/feddst.

翻译：联邦学习( FL) 能够将机器学习工作量从云层分配到资源有限的边缘设备。不幸的是, 目前深层次的网络不仅在边缘设备上仍然太高的计算和训练, 并且过于庞大, 无法通过带宽限制的网络来交流最新消息。在本文中, 我们开发、实施并实验性地验证了一个全新的 FL 框架, 称为 Freed Dreak Sprass Train (FedDST), 通过这个框架, 复杂的神经网络可以部署和培训, 并且大大提高了从云层计算和网络通信的效率。在 FDST 的核心是一个动态过程, 从目标全网络中提取和训练稀薄的子网络。有了这个计划, “ 两只用一块石头杀死了两只死在带宽广的网络上更新更新最新数据 ”, 每个客户对设备与云层之间只传输稀薄的网络进行高效培训。此外, 我们的 FL 培训期间的动态的电磁性能比固定的、共享的防漏面具更灵活地适应了FL 。此外, 的动态空间中, 也自然地将一个“ 更新了F- ST 的自我- L 格式的自我- train- train- train- train- train- train- train- train- train- train- train- train- train- d- dal- dald- dald- dald- d- d- dal- dald- dal ex- dald- fald- dald- fald- dald- dald- fald- fald-d-d- fald-d-d-d-d-d-d-d-d- dald-d-d-d-d-d-d-d-d-d- fald- fald-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-late-ld- dal- dal-d-d-d-d-ld-l-ld-ld- dal-ld-ld-d-d-d-ld-ld-