We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al.~\cite{dong}, which uses more complicated schemes (coordinate-wise median, trimmed mean). Furthermore, for communication efficiency, we consider a generic class of $\delta$-approximate compressors from Karimireddi et al.~\cite{errorfeed} that encompasses sign-based compressors and top-$k$ sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate for non-convex smooth loss functions. We show that, in certain range of the compression factor $\delta$, the (order-wise) rate of convergence is not affected by the compression operation. Moreover, we analyze the compressed gradient descent algorithm with error feedback (proposed in \cite{errorfeed}) in a distributed setting and in the presence of Byzantine worker machines. We show that exploiting error feedback improves the statistical error rate. Finally, we experimentally validate our results and show good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.
翻译:我们开发了一种对拜占庭工人机器具有强大的通信高效分布式学习算法。 我们提出并分析一种分布式梯度-白日算法,根据梯度规范进行简单的阈值,以减少拜占庭失败。 我们展示了我们算法与Yin et al. ⁇ cite{dong} 匹配Yin 和 al. ⁇ cite{dong} 的(统计性)错误率,这种算法使用更为复杂的计划(准中位数,斜度) 。 此外,为了通信效率,我们考虑了一个通用类别,即Karimireddi et al. ⁇ cite{errfeed}, 包括基于信号的压缩压缩器和顶价-k$ sloadarization。 我们的算法使用压缩梯度和梯度规则来合并和Byzantine的清除。 我们为非convex 平稳损失功能建立了统计错误率。 我们在某些压缩系数 $\deltata, (顺序) 趋联调率不受压缩操作的影响。 此外,我们用一个简化的梯度的梯度下梯度下降运算算法与错误反馈,我们用一个错误显示了我们实验室的计算结果。