Federated learning commonly relies on algorithms such as distributed (mini-batch) SGD, where multiple clients compute their gradients and send them to a central coordinator for averaging and updating the model. To optimize the transmission time and the scalability of the training process, clients often use lossy compression to reduce the message sizes. DRIVE is a recent state of the art algorithm that compresses gradients using one bit per coordinate (with some lower-order overhead). In this technical report, we generalize DRIVE to support any bandwidth constraint as well as extend it to support heterogeneous client resources and make it robust to packet loss.
翻译:联邦学习通常依赖分布式(微量)SGD等算法,其中多个客户计算其梯度并将其发送到中央协调员,以便平均和更新模型。为了优化传输时间和培训过程的可缩放性,客户往往使用损失压缩来缩小信息大小。Dive是最新的一种现代算法,它利用每个坐标一个位来压缩梯度(加上一些较低级的间接费用 ) 。 在本技术报告中,我们推广Dive, 以支持任何带宽限制, 并将它扩大到支持多种客户资源, 使其对包装损失更加有力 。