Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.
翻译:分散学习通过基础图形中通信的代理使数据在各方之间自然分布,提供隐私和通信效率。我们在过参数化学习环境下进行研究,其中通过梯度下降在可分数据上进行模型训练。具体而言,针对分散式梯度下降(DGD)和许多在无穷时渐近为零的损失函数(包括指数和逻辑损失),我们推导出新的有限时间泛化界限。这补充了最近一系列关于梯度下降在可分数据上的泛化性能和隐式偏差的研究,但迄今为止局限于集中式学习场景中。值得注意的是,我们的泛化界限近似匹配其集中式对应界限的顺序。与此密切相关且具有独立的研究价值的是,对于一类自我限制的损失函数,建立了DGD的训练损失和共识速率的新界限。最后,在算法方面,我们设计了针对可分数据的分散式学习的改进型基于梯度的例程,并在训练和泛化性能方面展示了几个数量级的加速。