Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning. To date, however, most of the existing works are restricted to either bounded gradient delays or convex settings. In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays. We prove $o(1/\sqrt{k})$ convergence rate for Async-SGD and $o(1/k)$ for Async-SGDI. Also, a unifying sufficient condition for Async-SGD's convergence is established, which includes two major gradient delay models in the literature as special cases and yields a new delay model not considered thus far.
翻译:由于Async-SGD和Async-SGD在机器学习中的基本作用,近年来,由于Async-SGD在机器学习中的基本作用,人们越来越关注非同步随机梯度梯度下降法(Async-SGD)的趋同性,但是,迄今为止,大多数现有工程都局限于捆绑梯度延迟或混凝土设置,在本文件中,我们侧重于Async-SGD及其变式Async-SGDI(其批量使用量越来越大),以解决非凝固性梯度延迟的优化问题。我们证明,Async-SGD和Async-SGDI的合用美元(1/k)的趋同率是美元。此外,为Async-SGD的趋同性设定了一个统一的充分条件,其中包括文献中作为特殊情况的两个主要的梯度延迟模型,并产生了迄今为止未考虑的新的延迟模型。