Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. However, despite that numerous algorithms have been proposed with empirical successes, their theoretical guarantees are restrictive and rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed growth condition. In addition, how to characterize the data heterogeneity among the agents and its impacts on the algorithmic performance remains challenging. In light of such motivations, we revisit the classical FedAvg algorithm for solving the distributed stochastic optimization problem and establish the convergence results under only a mild variance condition on the stochastic gradients for smooth nonconvex objective functions. Almost sure convergence to a stationary point is also established under the condition. Moreover, we discuss a more informative measurement for data heterogeneity as well as its implications.
翻译:最近,由于在解决大型机器学习问题方面的有效性,分散的随机优化最近引起了人们的极大关注。然而,尽管提出了无数的算法,并取得了经验性的成功,但其理论保障是限制性的,并依赖于从统一界限到宽松增长条件等不同程度的随机梯度的某些约束性条件。此外,如何辨别各种物剂之间的数据异质性及其对算法性效果的影响仍然具有挑战性。鉴于这种动机,我们重新审视传统的FedAvg算法,以解决分布式随机优化问题,并在光滑的非convex目标功能的随机梯度的微小差异条件下确定趋同结果。几乎可以肯定,在此条件下,与固定点的趋同也已经确立。此外,我们讨论了数据异性及其影响的信息计量方法。