Decentralized learning algorithms enable the training of deep learning models over large distributed datasets generated at different devices and locations, without the need for a central server. In practical scenarios, the distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed (IID). This paper focuses on improving decentralized learning over non-IID data distributions with minimal compute and memory overheads. We propose Neighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the received neighbors' model parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). Further, we present CompNGC, a compressed version of NGC that reduces the communication overhead by $32 \times$ by compressing the cross-gradients. We demonstrate the empirical convergence and efficiency of the proposed technique over non-IID data distributions sampled from the CIFAR-10 dataset on various model architectures and graph topologies. Our experiments demonstrate that NGC and CompNGC outperform the existing state-of-the-art (SoTA) decentralized learning algorithm over non-IID data by $1-5\%$ with significantly less compute and memory requirements. Further, we also show that the proposed NGC method outperforms the baseline by $5-40\%$ with no additional communication.
翻译:分散化的学习算法使得能够对在不同装置和地点生成的大型分布式数据集进行深层次学习模型的培训,而不需要中央服务器。 在实际假设中,分布式的数据集可以在整个代理商之间有显著不同的数据分布。当前最先进的分散式算法主要假设数据分布是独立和同义分布(IID)。本文侧重于改进非IID数据分布的分散化学习,同时使用最小的计算和记忆管理管理器。我们建议NBOGGGGG(NGC),这是一个新的分散化式学习算法,利用自我和跨梯度信息修改每个代理商的本地梯度。特别是,拟议的方法将模型的本地梯度替换为自梯度、模型变异性跨梯度的加权平均值(接收到的邻居与本地数据集相比的模型的模型),以及数据变化式交叉化的NGGGC(当地模型与邻居的本地模型相比的本地模型),我们展示了本地的本地基比值数据流流数据流流化的模型,我们展示了本地基价数据流数据流的模型。