Distributed Learning often suffers from Byzantine failures, and there have been a number of works studying the problem of distributed stochastic optimization under Byzantine failures, where only a portion of workers, instead of all the workers in a distributed learning system, compute stochastic gradients at each iteration. These methods, albeit workable under Byzantine failures, have the shortcomings of either a sub-optimal convergence rate or high computation cost. To this end, we propose a new Byzantine-resilient stochastic gradient descent algorithm (BrSGD for short) which is provably robust against Byzantine failures. BrSGD obtains the optimal statistical performance and efficient computation simultaneously. In particular, BrSGD can achieve an order-optimal statistical error rate for strongly convex loss functions. The computation complexity of BrSGD is O(md), where d is the model dimension and m is the number of machines. Experimental results show that BrSGD can obtain competitive results compared with non-Byzantine machines in terms of effectiveness and convergence.
翻译:分布式学习常常受到拜占庭失败的困扰,一些研究拜占庭失败下分布式随机优化问题的工作,只有一部分工人,而不是分布式学习系统中的所有工人,在每次迭代中计算随机梯度。这些方法虽然在拜占庭失败下是可行的,但有亚最佳趋同率或高计算成本的缺陷。为此,我们提议采用新的拜占庭抗御型梯度脱底算法(BRSGD,短期),在对付拜占庭失败时,这种算法相当有力。只有一部分工人,而不是分布式学习系统中的所有工人,在每次迭代计算时都计算出蒸气梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度。特别是,布尔占庭可以同时达到一个对强共性损失功能的顺序最佳统计错误率。BRCD的计算复杂度为O(md),其中的模型尺寸为M(md),机器的数量为M(m)。实验结果显示,从有效性和汇合来看,BRCD可以取得与非Byzantine机器相比的竞争结果。