We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.). During the learning process, an unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages, but rely on the assumption that all the regular workers have i.i.d. data, which is not the case in many federated learning applications. In light of the significance of reducing stochastic gradient noise for mitigating the effect of Byzantine attacks, we use a resampling strategy to reduce the impact of both inner variation (that describes the sample heterogeneity on every regular worker) and outer variation (that describes the sample heterogeneity among the regular workers), along with a stochastic average gradient algorithm to gradually eliminate the inner variation. The variance-reduced messages are then aggregated with a robust geometric median operator. We prove that the proposed method reaches a neighborhood of the optimal solution at a linear convergence rate and the learning error is determined by the number of Byzantine workers. Numerical experiments corroborate the theoretical results and show that the proposed method outperforms the state-of-the-arts in the non-i.i.d. setting.
翻译:在学习过程中,未知数量的拜占庭工人可能会向中央节点发送恶意信息,从而导致显著的学习错误。 大部分拜占庭-粗野方法(Byzantine-robust 方法)通过使用强大的聚合规则来汇总收到的信息来解决这一问题,但依据的假设是,所有正规工人都有i.d.数据,这在许多联邦化学习应用程序中并不是这样。鉴于减少随机梯度噪音对减轻拜占庭袭击影响的重要性,我们使用重新采样战略来减少内部变异(描述每个正规工人的样本异质性)和外部变异(描述正规工人的样本异质性)的影响,同时采用随机平均梯度算法以逐步消除内部变异。随后,将差异变异信息与强大的几何测深中位操作者进行汇总。我们证明,拟议方法在不线趋近于最佳解决办法的附近,以线趋近率衡量了非正统趋同率速度,而新方法的学习结果则由国家测试确定。