We study a family of algorithms, which we refer to as local update methods, generalizing many federated and meta-learning algorithms. We prove that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss we exactly characterize. Moreover, fundamental algorithmic choices (such as learning rates) explicitly govern a trade-off between the condition number of the surrogate loss and its alignment with the true loss. We derive novel convergence rates showcasing these trade-offs and highlight their importance in communication-limited settings. Using these insights, we are able to compare local update methods based on their convergence/accuracy trade-off, not just their convergence to critical points of the empirical loss. Our results shed new light on a broad range of phenomena, including the efficacy of server momentum in federated learning and the impact of proximal client updates.
翻译:我们研究的是一套算法,我们称之为本地更新方法,将许多联合和元学习算法普遍化。我们证明,对于二次模型,本地更新方法相当于替代损失的一级优化。此外,基本的算法选择(如学习率)明确决定了代谢损失条件数与其与真实损失的匹配之间的权衡。我们得出新的趋同率,展示了这些取舍,并强调了它们在通信有限环境下的重要性。我们利用这些洞察力,能够比较基于其趋同/准确性交换的本地更新方法,而不仅仅是它们与经验损失临界点的趋同。我们的结果为一系列广泛的现象提供了新的启示,包括联结学习服务器动力的功效以及准用户更新的影响。