A significant bottleneck in federated learning is the network communication cost of sending model updates from client devices to the central server. We propose a method to reduce this cost. Our method encodes quantized updates with an appropriate universal code, taking into account their empirical distribution. Because quantization introduces error, we select quantization levels by optimizing for the desired trade-off in average total bitrate and gradient distortion. We demonstrate empirically that in spite of the non-i.i.d. nature of federated learning, the rate-distortion frontier is consistent across datasets, optimizers, clients and training rounds, and within each setting, distortion reliably predicts model performance. This allows for a remarkably simple compression scheme that is near-optimal in many use cases, and outperforms Top-K, DRIVE, 3LC and QSGD on the Stack Overflow next-word prediction benchmark.
翻译:联合会学习中的一个重大瓶颈是将模型更新从客户端设备发送到中央服务器的网络通信成本。 我们提出降低成本的方法。 我们的方法将更新的量化编码为适当的通用代码,同时考虑到其经验分布。 由于量化引入错误, 我们通过优化平均总比特率和梯度扭曲的预期取舍来选择量化水平。 我们从经验上证明,尽管联合会学习的性质非i.d., 费率扭曲前沿是跨数据集、优化器、客户和培训回合一致的, 在每个设置内, 扭曲可靠地预测模型性能。 这允许一个非常简单的压缩方案, 在许多使用的情况下接近最佳, 并在 Stack overflow 下一个词预测基准上超越Top-K、 Dive、 3LC 和 QSGD 。