Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge and may severely deteriorate the generalization performance. In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity. We propose a novel momentum-based method to mitigate this decentralized training difficulty. We show in extensive empirical experiments on various CV/NLP datasets (CIFAR-10, ImageNet, and AG News) and several network topologies (Ring and Social Network) that our method is much more robust to the heterogeneity of clients' data than other existing methods, by a significant improvement in test performance ($1\% \!-\! 20\%$). Our code is publicly available.
翻译:深层学习模式的分散化培训是使数据隐私得以实现和在网络上进行知识学习的一个关键要素。在现实的学习情景中,不同客户的本地数据集存在差异构成一个优化挑战,并可能严重恶化一般化绩效。在本文中,我们调查并确定不同程度数据差异性的若干分散化优化算法的局限性。我们提出了一种新的基于动力的方法来减轻这种分散化培训困难。我们在各种CV/NLP数据集(CIFAR-10、图像网和AG News)和若干网络表层(Ring 和社会网络)的广泛实验中展示了我们的方法比其他现有方法更加健全,通过测试性能的显著改进(1.\\!\\\\ 20 $),我们的代码是公开的。