This paper considers the problem of resilient distributed optimization and stochastic machine learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has a local cost function. The agents collaborate with the server to find a minimum of their aggregate cost functions. We consider the case when some of the agents may be asynchronous and/or Byzantine faulty. In this case, the classical algorithm of distributed gradient descent (DGD) is rendered ineffective. Our goal is to design techniques improving the efficacy of DGD with asynchrony and Byzantine failures. To do so, we start by proposing a way to model the agents' cost functions by the generic notion of $(f, \,r; \epsilon)$-redundancy where $f$ and $r$ are the parameters of Byzantine failures and asynchrony, respectively, and $\epsilon$ characterizes the closeness between agents' cost functions. This allows us to quantify the level of redundancy present amongst the agents' cost functions, for any given distributed optimization problem. We demonstrate, both theoretically and empirically, the merits of our proposed redundancy model in improving the robustness of DGD against asynchronous and Byzantine agents, and their extensions to distributed stochastic gradient descent (D-SGD) for robust distributed machine learning with asynchronous and Byzantine agents.
翻译:本文探讨了基于服务器的架构中弹性分配优化和随机机体学习的问题。 系统由服务器和多个代理商组成, 每个代理商都具有本地成本功能。 代理商与服务器合作寻找最低总成本功能。 我们考虑一些代理商可能不同步和/或Byzantine错误的情况。 在这种情况下, 分配梯度下降的经典算法( DGD) 变得无效。 我们的目标是设计技术, 以无同步和拜占庭失败的方式提高决定指导文件的效率。 为了做到这一点, 我们首先提出一种方法, 以美元( f,\,r;\ epsilon) 的一般概念来模拟代理商的成本功能。 当一些代理商分别是 Byzantine 失败和 asynchrony( DD) 的参数时, 我们考虑的是, 美元和 美元是Byzanticatin 的参数, 而美元则是代理商成本功能之间的近身关系。 这使我们可以量化代理商成本功能中的冗余程度, 用于任何分配最佳优化的问题。 我们从理论上和实验性地展示了我们机尾部的机变的机变的机变后,, 和机变后变后变后变的机床的机变的机床的机床, 。