This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD.
翻译:本文涉及在恶意拜占庭袭击发生时在网络上进行学习的分布式有限和优化; 为了应对此类袭击,迄今为止最有弹性的方法是将随机梯度梯度下降(SGD)与不同的强力聚合规则相结合。然而,由SGD引发的大规模SGD导致的随机梯度噪音使得很难区分拜占庭袭击者发送的恶意信息与“诚实”工人发送的噪音沙度梯度。这促使我们减少作为在拜占庭袭击发生时巩固 SGD 手段的随机梯度差异。为此,目前的工作提出了一种具有弹性的Byzantine袭击(Byrd-SGA)分布式(Byrd-)法则提出了在网络上进行有限和优化的学习任务(Byrd-) SAGA(SGA) 方法。 新的Byrd-SAGA(B-SAGA) 依据测深度中测深度标准,将工人发送的经校正度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度升高。 当不到一半时, 使Byrd-SA&SAGAaa) 以最稳稳性中 的直度中度中度测度测试为最稳性稳性测测为最深的比。