We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent~(SGD) procedure which tolerates Byzantine failures of the workers. In contrast to previous work, Zeno++ removes some unrealistic restrictions on worker-server communications, allowing for fully asynchronous updates from anonymous workers, arbitrarily stale worker updates, and the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing approaches.
翻译:我们提出Zeno++,这是一个新的强健的零星碎裂源(SGD)程序,可以容忍拜占庭工人的失败。 与以往的工作不同,Zeno+取消对工人-服务器通信的一些不现实的限制,允许匿名工人完全无动于衷地提供最新消息,武断地淡化工人的更新,以及不受限制地增加拜占庭工人的可能性。 关键的想法是估计在应用候选梯度后损失值的下降,其中大量下降值表明更新的优化进展结果。 我们证明Zeno++在拜占庭失败下对非康牛问题有趋同作用。 实验结果显示Zeno++超越了现有方法。