Decentralized learning algorithms empower interconnected devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator. In the case of heterogeneous data distributions at the network nodes, collaboration can yield predictors with unsatisfactory performance for a subset of the devices. For this reason, in this work, we consider the formulation of a distributionally robust decentralized learning task and we propose a decentralized single loop gradient descent/ascent algorithm (AD-GDA) to directly solve the underlying minimax optimization problem. We render our algorithm communication-efficient by employing a compressed consensus scheme and we provide convergence guarantees for smooth convex and non-convex loss functions. Finally, we corroborate the theoretical findings with empirical results that highlight AD-GDA's ability to provide unbiased predictors and to greatly improve communication efficiency compared to existing distributionally robust algorithms.
翻译:分散式学习算法授权相互关联的设备共享数据和计算资源,以便在没有中央协调员帮助的情况下合作培训机器学习模式。在网络节点的不同数据分布方面,协作可以产生功能不尽人意的预测器,用于部分设备。为此原因,我们考虑制定分布式强的分散式学习任务,并提议一个分散式的单一循环梯度下降/增益算法(AD-GDA),以直接解决基本的小型最大峰值优化问题。我们采用压缩的共识方案,使我们的算法通信效率高,并为平稳的二次曲线和非二次曲线损失功能提供趋同保证。最后,我们用实验结果证实了理论结论,这些实验结果突出了AD-GDA提供无偏向性预测器的能力,并大大提高通信效率,与现有的分布式强的算法相比。