In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples. In the presence of one or more malicious servers sending incorrect information (a Byzantine adversary), standard algorithms for model training such as stochastic gradient descent (SGD) fail to converge. In this paper, we present a simplified convergence theory for the generic Byzantine Resilient SGD method originally proposed by Blanchard et al. [NeurIPS 2017]. Compared to the existing analysis, we shown convergence to a stationary point in expectation under standard assumptions on the (possibly nonconvex) objective function and flexible assumptions on the stochastic gradients.
翻译:在分布式学习中,一个中央服务器根据保存当地数据样本的节点提供的最新信息,对一个模型进行培训;在一个或多个恶意服务器发送不正确信息(拜占庭敌手)的情况下,模型培训的标准算法,如随机梯度梯度下移(SGD),未能趋同;在本文件中,我们为Blanchard等人最初提议的通用ByzantineReslient SGD方法[NeurIPS 提供了简化的趋同理论;与现有分析相比,我们根据关于(可能非convex)客观函数的标准假设和关于随机梯度的灵活假设,显示了与预期的固定点的趋同。