Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrates on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms, D-SUM and GT-DSUM, based on the momentum technique with decentralized stochastic gradient descent(SGD). The former provides a convergence guarantee for general non-convex objectives. At the same time, the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity(i.e., distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy by up to 57.6% compared to other methods in practice.
翻译:新兴分布式应用最近促进了分散式机器学习的发展,特别是在IoT和边际计算领域。在现实世界情景中,非混凝土和数据异质的共同问题导致效率低、性能退化和发展停滞。大部分研究集中于上述问题之一,而没有经证明是最佳的更一般性框架。为此,我们提议一个统一模式,称为UMP,由两种算法组成,即D-SUM和GT-DSUM,以分散式梯度梯度下移(SGD)动力技术为基础。前者为一般非康维x目标提供了趋同保证。与此同时,后者通过引入梯度跟踪扩大,估计全球优化方向以缓解数据异质性(即分布漂浮),我们可覆盖基于古典型重球或Nesterov加速型的大多数动力变异体,而 UMP则有不同的参数。理论上,我们严格提供这两种方法的趋同分析,用于非康维克斯目标,并进行广泛的实验,显示与其他方法相比,在57.6%到57.6%之间,模型精确度方面有了很大的改进。</s>