We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
翻译:我们对当地SGD进行了新的分析,删除了不必要的假设,并阐述了两种数据制度之间的差异:相同和差异性;在这两种情况下,我们改进了现有的理论,提供了最佳步骤和最佳地方迭代数的值;我们的界限基于一种新的差异概念,这种概念是具有不同数据的当地SGD方法所特有的;当我们加插1美元(其中H美元是当地步骤的数目)时,我们的结果很紧,通过收回已知的报表,保证了我们的结果的紧凑性;当我们加插1美元(其中H美元是当地步骤的数目);经验证据进一步证实数据差异性对当地SGD绩效的严重影响。