Large-scale non-convex optimization problems are expensive to solve due to computational and memory costs. To reduce the costs, first-order (computationally efficient) and asynchronous-parallel (memory efficient) algorithms are necessary to minimize non-convex functions in machine learning. However, asynchronous-first-order methods applied within non-convex settings run into two difficulties: (i) parallelization delays, which affect convergence by disrupting the monotonicity of first-order methods, and (ii) sub-optimal saddle points where the gradient is zero. To solve these two difficulties, we propose an asynchronous-coordinate-gradient-descent algorithm shown to converge to local minima with a bounded delay. Our algorithm overcomes parallelization-delay issues by using a carefully constructed Hamiltonian function. We prove that our designed kinetic-energy term, incorporated within the Hamiltonian, allows our algorithm to decrease monotonically per iteration. Next, our algorithm steers iterates clear of saddle points by utilizing a perturbation sub-routine. Similar to other state-of-the-art (SOTA) algorithms, we achieve a poly-logarithmic convergence rate with respect to dimension. Unlike other SOTA algorithms, which are synchronous, our work is the first to study how parallelization delays affect the convergence rate of asynchronous first-order algorithms. We prove that our algorithm outperforms synchronous counterparts under large parallelization delays, with convergence depending sublinearly with respect to delays. To our knowledge, this is the first local optima convergence result of a first-order asynchronous algorithm for non-convex settings.
翻译:由于计算和内存成本,大规模非convex优化问题非常昂贵,无法解决。 为了降低成本, 第一阶( 计算效率) 和不同步平行平行( 模拟效率) 算法是必要的, 以最大限度地减少机器学习中的非 convex 函数。 但是, 在非 convex 设置中应用的不同步第一阶方法有两种困难:(一) 平行延迟, 破坏第一阶方法的单调性, 从而影响趋同, 以及 (二) 梯度为零的次优化搭配点。 要解决这两个困难, 我们建议使用一个不同步的同步比对齐- 平衡( 模拟效率) 算法, 我们的算法克服了在非 contax 设置中应用的平行调和问题。 我们设计的动能术语在汉密尔密尔顿内, 使得我们的计算方法能够减少第一阶方法的单一度, 以及 (二) 渐行的搭配点, 我们的次搭配比, 我们的连接点是使用一个超近的轨道 。