One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial \emph{random} values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a \emph{stochastic} binary mask to find the optimal sparse random network within the original one. At the end of the training, the final model is a sparse network with random weights -- or a subnetwork inside the dense random network. We show improvements in accuracy, communication (less than $1$ bit per parameter (bpp)), convergence speed, and final model size (less than $1$ bpp) over relevant baselines on MNIST, EMNIST, CIFAR-10, and CIFAR-100 datasets, in the low bitrate regime under various system configurations.
翻译:联盟学习中的一项主要挑战是在每个回合中交换客户到服务器的重量更新所需的大量通信费用。 虽然先前的工作在通过梯度压缩方法压缩重量更新方面取得了很大进展, 但我们建议了一个完全不同的方法, 完全不更新重量。 相反, 我们的方法冻结了最初的 emph{ random} 值的重量, 并学习了如何为最佳性能而将随机网络封隔起来。 为此, 客户合作训练了一个双面面罩, 以在原始网络中找到最佳的稀散随机网络。 在培训结束时, 最后的模式是一个随机重量的稀疏网络 -- -- 或在密集随机网络内的一个子网络。 我们显示,在各种系统配置下的低位式系统中, 精确性、 通信( 每参数(bpp)小于1美元)、 趋同速度和最终模型大小(小于1美元), 超过MNIST、 EMNIST、 CIFAR- 10 和 CIFAR- 100 数据集的相关基线。