This paper proposes a Robust Gradient Classification Framework (RGCF) for Byzantine fault tolerance in distributed stochastic gradient descent. The framework consists of a pattern recognition filter which we train to be able to classify individual gradients as Byzantine by using their direction alone. This filter is robust to an arbitrary number of Byzantine workers for convex as well as non-convex optimisation settings, which is a significant improvement on the prior work that is robust to Byzantine faults only when up to 50% of the workers are Byzantine. This solution does not require an estimate of the number of Byzantine workers; its running time is not dependent on the number of workers and can scale up to training instances with a large number of workers without a loss in performance. We validate our solution by training convolutional neural networks on the MNIST dataset in the presence of Byzantine workers.
翻译:本文建议了拜占庭在分布式随机梯度下下降时对拜占庭差错容忍度的强力梯度分类框架(RGCF ) 。 框架包括一个模式识别过滤器, 我们训练该过滤器能够单凭方向将个别梯度归类为拜占庭。 这个过滤器对任意数量的拜占庭工人来说是强大的, 包括隐形和非混凝土优化设置, 这大大改进了以前对拜占庭差错的强力工作, 只有当50%的工人是拜占庭工人时才如此。 这一解决方案不需要估计拜占庭工人的人数; 其运行时间并不取决于工人的人数, 并且可以扩大到培训大量工人, 而不造成工作损失的事例。 我们通过在拜占庭工人在场的情况下培训MTIST数据集上的同动神经网络来验证我们的解决方案。