The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non)convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.
翻译:以随机梯度为基础的方法的稳定性和普及性为了解机器学习模型的算法性能提供了宝贵的洞察力。作为深层学习的主要工作马,随机梯度的下降得到了大量的研究,然而,社区很少注意其分散的变种。在本文件中,我们提供了分散的随机梯度下降的新颖的公式。我们利用这一公式和(非)混凝土优化理论,为分散的随机梯度下降建立了第一个稳定性和普及性保障。我们的理论结果建立在少数共同和温和的假设之上,并揭示了权力下放第一次恶化了SGD的稳定性。我们利用各种分散的环境和基准机器学习模型来核查我们的理论结论。