Both Byzantine resilience and communication efficiency have attracted tremendous attention recently for their significance in edge federated learning. However, most existing algorithms may fail when dealing with real-world irregular data that behaves in a heavy-tailed manner. To address this issue, we study the stochastic convex and non-convex optimization problem for federated learning at edge and show how to handle heavy-tailed data while retaining the Byzantine resilience, communication efficiency and the optimal statistical error rates simultaneously. Specifically, we first present a Byzantine-resilient distributed gradient descent algorithm that can handle the heavy-tailed data and meanwhile converge under the standard assumptions. To reduce the communication overhead, we further propose another algorithm that incorporates gradient compression techniques to save communication costs during the learning process. Theoretical analysis shows that our algorithms achieve order-optimal statistical error rate in presence of Byzantine devices. Finally, we conduct extensive experiments on both synthetic and real-world datasets to verify the efficacy of our algorithms.
翻译:最近,拜占庭韧性和通信效率因其在边缘联邦学习中的重要性而受到了广泛关注。然而,大多数现有算法在处理真实世界的行为呈重尾特征的非规则数据时可能会失败。为了解决这个问题,我们研究了边缘联邦学习中的随机凸和非凸优化问题,并展示了如何同时保持拜占庭弹性、通信效率和最优统计误差率来处理重尾数据。具体而言,我们首先提出了一种具有拜占庭韧性的分布式梯度下降算法,可以处理重尾数据并在标准假设下收敛。为了降低通信开销,我们进一步提出了另一个算法,将梯度压缩技术并入其中,在学习过程中节省通信成本。理论分析表明,我们的算法在存在拜占庭设备的情况下实现了渐进最优的统计误差率。最后,我们在合成和真实世界数据集上进行广泛实验,验证了我们算法的有效性。