Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server. Recently, decentralized learning algorithms have demonstrated state-of-the-art results on benchmark data sets, comparable with centralized algorithms. However, the key assumption to achieve competitive performance is that the data is independently and identically distributed (IID) among the agents which, in real-life applications, is often not applicable. Inspired by ideas from continual learning, we propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm where (i) each agent aggregates cross-gradient information, i.e., derivatives of its model with respect to its neighbors' datasets, and (ii) updates its model using a projected gradient based on quadratic programming (QP). We theoretically analyze the convergence characteristics of CGA and demonstrate its efficiency on non-IID data distributions sampled from the MNIST and CIFAR-10 datasets. Our empirical comparisons show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms, as well as maintaining the improved performance under information compression to reduce peer-to-peer communication overhead. The code is available here on GitHub.
翻译:通过分散化学习算法,最近,分散化的学习算法展示了基准数据集方面最先进的、与中央算法相可比的基准数据集成果;然而,实现竞争性绩效的关键假设是,数据在实际应用中往往不适用的代理商之间独立和同样地分布(IID),在现实应用中,这些数据往往不适用。根据不断学习的想法,我们提议跨大分类(CGA),这是一个新的分散化学习算法,其中(一) 每一代理商汇总交叉式信息,即其模型在邻国数据集方面的衍生物,并(二) 使用基于四边形程序(QP)的预测梯度更新其模型。我们从理论上分析CGA的趋同特性,并展示其在从MNIST和CIFAR-10数据集抽样的非IID数据分发方面的效率。我们的经验比较表明,CGA相对于现有的最新分散化学习算法的学习表现优于现有的状态,即每个代理商数据集的衍生物,以及在这里维持改进的GiAUB Dover-deal-deal-deal-to Gial-deb-dalmab-pal-deal-de-paltracal-pal-pal-dal-pal-paltracal-Gial-Gial-Gial-Gial-Gial-dal-Cal-Cal-dal-Cal-GAb-GAb-GAb-GAb-Cal-Cal-Cal-Cal-Bal-Cal-Cal-Cal-CLMDMDMDMD-GAbal-C-C-C-G-G)下改进的绩效,这是在GAb-C-C-GAb-C-D-C-C-C-C-C-D-D-D-D-D-C-C-C-C-D-D-GAbal-GAbal-D-C-C-C-C-D-D-D-D-D-D-D-D-C-C-C-C-GAD-GAD-GAD-GAD-GAD-GAD-C-C-C-C-C