This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function $Q$, as a core part of optimization and learning algorithms. Our algorithms are designed for the cluster-based model, which combines message-passing and shared-memory communication layers. Processes may fail by crashing, and the algorithm inside each cluster is wait-free, using only reads and writes. For a strongly convex function $Q$, our algorithm tolerates any number of failures, and provides convergence rate that yields the maximal distributed acceleration over the optimal convergence rate of sequential SGD. For arbitrary functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on $Q$.) In this case, the algorithm obtains the same convergence rate as sequential SGD, up to a logarithmic factor. This is achieved by using, at each iteration, a multidimensional approximate agreement algorithm, tailored for the cluster-based model. The algorithm for arbitrary functions requires that at least a majority of the clusters contain at least one nonfaulty process. We prove that this condition is necessary when optimizing some non-convex functions.
翻译:本文展示了错误容忍性非同步的 Stochastectic Gladient Emple (SGD) 算法。 SGD 被广泛用作优化和学习算法的核心部分,作为优化和学习算法的核心部分,用于接近成本函数的最小分配速度($Q) 。 我们的算法是为集群模型设计的,该模型结合了信息传递和共享- 模拟通信层。 崩溃过程可能失败, 每个组内的算法没有等待, 仅使用阅读和写作。 对于一个强烈的 convex 函数, 我们的算法可以容忍任何数目的失败, 并提供趋同率, 使最大分配速度超过顺序 SGD 的最佳趋同率。 对于任意函数, 趋同率有一个额外的术语, 取决于同一迭代的参数之间的最大差异 。 (根据标准假设, $Q美元。 ) 在这种情况下, 算法获得与顺序 SGDGD( SGD) 相同的趋同率, 直至一个对数因素。 这是通过在每次试算时使用某种多层面的近似协议算法, 在最不必要的情况下, 在最差的组合模型上, 最不需要的模型中, 最不优化的模型中, 需要的算法功能需要一种最不需要的 。