We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server. In a final step, the local estimates are aggregated. We prove in the setting of overparameterized linear regression general upper bounds with matching lower bounds and derive learning rates for specific data generating distributions. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size. We further compare the sample complexity of DSGD with the sample complexity of distributed ridge regression (DRR) and show that the excess SGD-risk is smaller than the excess RR-risk, where both sample complexities are of the same order.
翻译:我们考虑利用对若干装置的连续步骤SGD(DSGD)进行分布式学习,每个装置都向中央服务器发送最后的模型更新。最后一步,对当地的估计进行汇总。我们在设定超分线回归总上限时,证明与较低的界限相匹配,并得出特定数据生成分布的学习率。我们表明,如果本地节点数量与全球样本规模相比增长幅度不太大,则超额风险按差异顺序排列。我们进一步比较了DSGD的抽样复杂性与分布式山脊回归(DRR)的样本复杂性,并表明超额SGD风险小于超额RR风险,因为两者的样本复杂性都相同。