SGD with momentum acceleration is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum acceleration is Distributed SGD (DSGD) with momentum acceleration (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum acceleration that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and decrease when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum acceleration whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the standard deep learning setting, where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $\beta\in [0, 1)$. Through image classification tasks, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum acceleration and can consistently outperform these existing methods when the data distributions are heterogeneous.
翻译:动力加速的SGD是改善神经网络性能的关键组成部分之一。对于分散化学习来说,一种使用动力加速的直截了当的方法是势头加速的分布式SGD(DSGD)和动力加速(DSGDMm)。然而,当数据分布在统计上各不相同时,DSGDm的表现比SGDM差得多。最近,一些研究探讨了这一问题,并提出了动力加速度加速度的方法,这种加速度比DSGDm更强,但比DSGDm更强,尽管它们的趋同率仍然取决于数据异质性,当数据分布有差异时则会减少。在本研究中,我们建议采用动力加速的方法,即加速度方法,其趋同率证明其趋同率独立于数据异性。更具体地说,我们在标准的深层次学习环境中分析运动跟踪运动的趋同率,其中的目标功能是非趋同性,并且使用了沙变梯度梯度梯度梯度梯度梯度梯度梯度。然后,我们确定,这种趋同数据异性系数($0,1美元)。通过图像分类任务,我们表明,在现有的数据加速化方法下,Momentum追踪可以更加稳健地学习现有数据加速。