The FedProx algorithm is a simple yet powerful distributed proximal point optimization method widely used for federated learning (FL) over heterogeneous data. Despite its popularity and remarkable success witnessed in practice, the theoretical understanding of FedProx is largely underinvestigated: the appealing convergence behavior of FedProx is so far characterized under certain non-standard and unrealistic dissimilarity assumptions of local functions, and the results are limited to smooth optimization problems. In order to remedy these deficiencies, we develop a novel local dissimilarity invariant convergence theory for FedProx and its minibatch stochastic extension through the lens of algorithmic stability. As a result, we contribute to derive several new and deeper insights into FedProx for non-convex federated optimization including: 1) convergence guarantees independent on local dissimilarity type conditions; 2) convergence guarantees for non-smooth FL problems; and 3) linear speedup with respect to size of minibatch and number of sampled devices. Our theory for the first time reveals that local dissimilarity and smoothness are not must-have for FedProx to get favorable complexity bounds. Preliminary experimental results on a series of benchmark FL datasets are reported to demonstrate the benefit of minibatching for improving the sample efficiency of FedProx.
翻译:FedProx 算法是一个简单而有力的分布式准点优化方法,广泛用于联邦化学习(FL)对各种数据。尽管它受到欢迎,而且在实践中取得了显著的成功,但FedProx的理论理解在很大程度上没有得到充分调查:FedProx的吸引力趋同行为迄今在对当地功能的某些非标准和不切实际的不同假设下具有特征,其结果仅限于平滑优化问题。为了纠正这些缺陷,我们为FedProx及其微小切分的扩展通过算法稳定性的镜片为FedProx开发了一种新的、更深入的洞察力。结果,我们为FedProx的非统一优化提供了若干新的、更深入的见解,包括:(1) 与本地异性类型条件无关的趋同保证;(2) 与非松动FL问题不相趋同的保证;和(3) 与微型批量和抽样装置的大小有关的线性加速。我们第一次的理论表明,FedProx不必通过可喜的复杂度扩展面镜对FedProx进行局部异性和平滑度的扩展。所报告的微精度数据基化效率系列的实验结果显示,用于改进FMPlax的Flabalbexbsbbbestbregregest的效益。