Distributed machine learning (DML) over time-varying networks can be an enabler for emerging decentralized ML applications such as autonomous driving and drone fleeting. However, the commonly used weighted arithmetic mean model aggregation function in existing DML systems can result in high model loss, low model accuracy, and slow convergence speed over time-varying networks. To address this issue, in this paper, we propose a novel non-linear class of model aggregation functions to achieve efficient DML over time-varying networks. Instead of taking a linear aggregation of neighboring models as most existing studies do, our mechanism uses a nonlinear aggregation, a weighted power-p mean (WPM) where p is a positive odd integer, as the aggregation function of local models from neighbors. The subsequent optimizing steps are taken using mirror descent defined by a Bregman divergence that maintains convergence to optimality. In this paper, we analyze properties of the WPM and rigorously prove convergence properties of our aggregation mechanism. Additionally, through extensive experiments, we show that when p > 1, our design significantly improves the convergence speed of the model and the scalability of DML under time-varying networks compared with arithmetic mean aggregation functions, with little additional 26computation overhead.
翻译:在时间变化的网络中,分散的机器学习(DML)可以成为新兴的分散式ML应用,如自主驾驶和无人驾驶机队的机队机队的辅助因素,但是,现有DML系统中常用的加权算法平均模型集成功能,通常使用的加权算法平均模型集成功能,可能会在时间变化的网络中造成高模型损失、低模型精度和慢速的趋同速度。为了解决这个问题,我们在本文件中提出一个新的非线性模型集成级模型集成功能,以便在时间变化的网络中实现高效的DML。我们的机制使用的是非线性集成模型集成,而不是像大多数现有研究那样,而是使用非线性集成,即加权电压平均值(WPM),P是正奇异的组合功能,作为邻居的当地模型集成功能。随后的优化步骤是使用由布雷格曼差异界定的反向下降,从而保持与优化的趋同性。我们分析WPMM的特性,并严格证明我们的聚合机制的趋同性。此外,通过广泛的实验,我们的设计大大改进了模型的趋同模式的趋同速度和DMDML在时间变化中的额外网络的伸缩的伸缩。