Distributed machine learning (DML) over time-varying networks can be an enabler for emerging decentralized ML applications such as autonomous driving and drone fleeting. However, the commonly used weighted arithmetic mean model aggregation function in existing DML systems can result in high model loss, low model accuracy, and slow convergence speed over time-varying networks. To address this issue, in this paper, we propose a novel non-linear class of model aggregation functions to achieve efficient DML over time-varying networks. Instead of taking a linear aggregation of neighboring models as most existing studies do, our mechanism uses a nonlinear aggregation, a weighted power-p mean (WPM), as the aggregation function of local models from neighbors. The subsequent optimizing steps are taken using mirror descent defined by a Bregman divergence that maintains convergence to optimality. In this paper, we analyze properties of the WPM and rigorously prove convergence properties of our aggregation mechanism. Additionally, through extensive experiments, we show that when p > 1, our design significantly improves the convergence speed of the model and the scalability of DML under time-varying networks compared with arithmetic mean aggregation functions, with little additional computation overhead.
翻译:在时间变化的网络中,分散的机器学习(DML)可以成为新出现的分散式ML应用,如自主驾驶和无人驾驶机群的助推器。然而,现有DML系统中常用的加权算法平均模型汇总功能,可能导致高模型损失、低模型精确度和时间变化网络的缓慢趋同速度。为解决这一问题,我们在本文件中提议了一个新的非线性模型集合功能类别,以便在时间变化的网络中实现高效的DML。我们的机制没有像大多数现有研究那样采用邻里模型的线性汇总,而是使用非线性汇总、加权电流平均值(WPM)作为当地模型的集成功能。随后采取的优化步骤是使用布雷格曼差异所定义的反射下降法,这种偏差将保持趋同最佳性。我们在本文件中分析了WPM的特性,并严格地证明我们聚合机制的趋同性。此外,通过广泛的实验,我们的设计大大改进了模型的趋同速度和DML在时间变化的网络中的伸缩性,与计算平均汇总功能相比,很少进行间接计算。