Decentralized learning algorithms empower interconnected devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator. In the case of heterogeneous data distributions at the network nodes, collaboration can yield predictors with unsatisfactory performance for a subset of the devices. For this reason, in this work we consider the formulation of a distributionally robust decentralized learning task and we propose a decentralized single loop gradient descent/ascent algorithm (AD-GDA) to directly solve the underlying minimax optimization problem. We render our algorithm communication-efficient by employing a compressed consensus scheme and we provide convergence guarantees for smooth convex and non-convex loss functions. Finally, we corroborate the theoretical findings with empirical results that highlight AD-GDA ability to provide unbiased predictors and to greatly improve communication efficiency compared to existing distributionally robust algorithms.
翻译:分散式学习算法授权相互关联的设备共享数据和计算资源,以便在没有中央协调员帮助的情况下合作培训机器学习模式。在网络节点的不同数据分布方面,协作可以产生功能不尽人意的预测器,用于其中一部分设备。因此,我们考虑制定分布式强的分散式学习任务,并提议一个分散式单一循环梯度下降/增益算法(AD-GDA),直接解决基本的小型最大峰值优化问题。我们采用压缩的共识方案,使算法通信效率高,我们为顺畅的曲线和非曲线损失功能提供趋同保证。最后,我们用实验结果证实了理论结论,这些实验结果突出了AD-GDA提供公正预测器的能力,并大大提高通信效率,与现有分布式强的算法相比。