Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.
翻译:分散化学习在通过一个基本图表进行交流的代理人之间自然分配数据时,可以提高隐私和通信效率。受过度分化学习环境的驱动,模型被训练为零培训损失,我们研究分散化学习的算法和一般化特性,在分层数据中,以梯度下降为分层数据,我们研究了分散化业绩和梯度下降对可分离数据隐含的偏向,但迄今为止仅限于集中化的学习情景。值得注意的是,我们的一般化把分散化的学习与梯度梯度下降相匹配,因此,对于分散化的梯度下降(DGD)和在无限化(包括指数和后勤损失)时从零到零的各种损失功能(DDDD),我们得出了新的有限时间概括性界限。最后,在算法方面,我们设计了一套改进的基于梯度的常规,用可分解的数据进行分散化的学习,并用经验显示在培训和一般化两方面的速度变化的顺序。