Distributed learning has become a necessity for training ever-growing models. In a distributed setting, the task is shared among several devices. Typically, the learning process is monitored by a server. Also, some of the devices can be faulty, deliberately or not, and the usual distributed SGD algorithm cannot defend itself from omniscient adversaries. Therefore, we need to devise a fault-tolerant gradient descent algorithm. We based our article on the SignSGD algorithm, which relies on the sharing of gradients signs between the devices and the server. We provide a theoretical upper bound for the convergence rate of SignSGD to extend the results of the original paper. Our theoretical results estimate the convergence rate of SignSGD against a proportion of general adversaries, such as Byzantine adversaries. We implemented the algorithm along with Byzantine strategies in order to try to crush the learning process. Therefore, we provide empirical observations from our experiments to support our theory. Our code is available on GitHub and our experiments are reproducible by using the provided parameters.
翻译:分布式学习已经成为培训不断增长的模型的必要条件。 在分布式环境中, 任务由多个设备共享。 通常, 学习过程由服务器监测。 此外, 一些设备可能是有意或无意的错误, 通常分布式 SGD 算法无法保护自己免受无意识对手的伤害。 因此, 我们需要设计一个不宽容的梯度下降算法。 我们的文章以SignSGD 算法为基础, 依靠在设备与服务器之间共享梯度符号。 我们为签名SGD的合并率提供了一个理论上的上限, 以扩展原始文件的结果。 我们的理论结果估计, 签字SGD 与一般对手( 如拜占庭对手) 的比例的趋同率。 我们与拜占庭战略一起实施了算法, 以试图粉碎学习过程。 因此, 我们从实验中提供经验性观察来支持我们的理论。 我们的代码可以在 GitHub 上查阅, 我们的实验通过提供参数可以重新复制。