We study the problem of optimizing a non-convex loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We consider a standard distributed setting with one central machine (parameter server) communicating with many worker machines. Our proposed algorithm is a variant of the celebrated cubic-regularized Newton method of Nesterov and Polyak \cite{nest}, which avoids saddle points efficiently and converges to local minima. Furthermore, our algorithm resists the presence of Byzantine machines, which may create \emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. Furthermore, being a second order algorithm, the iteration complexity is much lower than its first order counterparts, and thus our algorithm communicates little with the parameter server. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks.
翻译:我们研究的是在拜占庭机器在场的情况下,在分布式框架内优化非阴道损失功能(配有马鞍点)的问题。我们考虑一种标准分布式设置,配有一台中央机器(参数服务器),与许多工人机器进行通信。我们提议的算法是Nesterov 和 Polyak\cite{nest} 的已知的立方正规牛顿方法的一种变体,该方法避免了马鞍点的高效,与当地微型相交。此外,我们的算法抵制了拜占庭机器的存在,它可能在损失函数的顶点附近(也称为马鞍点攻击)产生\emph{fake当地迷你马}。我们强化了三次正规的牛顿算法,从而避免了马鞍点和假本地微型的高效。此外,作为第二个测序算法,它的复杂性大大低于第一个定序点,因此我们的算法与参数服务器的联系很少。我们从数种情况下获得我们提议的计划理论上的保证,包括近似(次扫描的)梯度梯度和赫锡人。此外,我们用标准数据定式数据模型验证我们的理论结论性结论性结论性结论性结论性结论性结论性结论性结论。