Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server. Recently, decentralized learning algorithms have demonstrated state-of-the-art results on benchmark data sets, comparable with centralized algorithms. However, the key assumption to achieve competitive performance is that the data is independently and identically distributed (IID) among the agents which, in real-life applications, is often not applicable. Inspired by ideas from continual learning, we propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm where (i) each agent aggregates cross-gradient information, i.e., derivatives of its model with respect to its neighbors' datasets, and (ii) updates its model using a projected gradient based on quadratic programming (QP). We theoretically analyze the convergence characteristics of CGA and demonstrate its efficiency on non-IID data distributions sampled from the MNIST and CIFAR-10 datasets. Our empirical comparisons show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms, as well as maintaining the improved performance under information compression to reduce peer-to-peer communication overhead.
翻译:分散化学习算法最近展示了基准数据集的最先进结果,与集中式算法相类似;然而,实现竞争性业绩的关键假设是,数据在实际应用中往往不适用的代理商之间是独立和同样分布的(IID)的。根据不断学习的想法,我们提议跨级聚合法(CGA),这是一种新的分散化学习算法,其中(一) 每种代理商汇总跨级信息,即其模型在邻国数据集方面的衍生物,以及(二) 使用基于四边式程序(QP)的预测梯度更新其模型。我们从理论上分析CGA的趋同特征,并展示其在从MNIST和CIFAR-10数据集抽样的非IID数据分布上的效率。我们的经验比较显示,CGA相对于现有状态分散化学习算法的学习表现优异,同时在信息压缩到降低同侪之间通信水平方面保持更好的业绩。