Communication between workers and the master node to collect local stochastic gradients is a key bottleneck in a large-scale federated learning system. Various recent works have proposed to compress the local stochastic gradients to mitigate the communication overhead. However, robustness to malicious attacks is rarely considered in such a setting. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. We point out that a vanilla combination of compressed stochastic gradient descent (SGD) and geometric median-based robust aggregation suffers from both stochastic and compression noise in the presence of Byzantine attacks. In light of this observation, we propose to jointly reduce the stochastic and compression noise so as to improve the Byzantine-robustness. For the stochastic noise, we adopt the stochastic average gradient algorithm (SAGA) to gradually eliminate the inner variations of regular workers. For the compression noise, we apply the gradient difference compression and achieve compression for free. We theoretically prove that the proposed algorithm reaches a neighborhood of the optimal solution at a linear convergence rate, and the asymptotic learning error is in the same order as that of the state-of-the-art uncompressed method. Finally, numerical experiments demonstrate effectiveness of the proposed method.
翻译:工人和总节点之间的沟通以收集本地随机梯度,这是大规模联合学习系统中一个关键的瓶颈。最近的各种工作提议压缩本地随机梯度,以减少通信管理费用。然而,在这样的环境下,很少考虑恶意袭击的稳健性。在这项工作中,我们调查拜占廷-粗粗面包结合学习压缩的问题,拜占庭工人的攻击可能是任意恶意的。我们指出,压缩随机梯度梯度下降和几何中位中位强集成的香草结合,在拜占庭攻击发生时,既要压缩本地随机梯度梯度梯度下降,又要压缩噪声。根据这一观察,我们提议联合减少对恶意袭击的稳健性反应,以便改善拜占庭-粗压的噪音。对于随机噪音,我们采用不稳的普通平均梯度算法(SAGA),以逐步消除正规工人的内部变异。对于压缩噪音,我们采用梯度差压缩差的中位中位稳健组合,并免费地进行压缩。根据这一观察,我们从理论上上证明,拟议采用最优的算法,即州级计算方法,最终将惯性计算得出了该方法。