Federated learning is one of the important learning scenarios in distributed learning, in which we aim at learning heterogeneous local datasets efficiently in terms of communication and computational cost. In this paper, we study new local algorithms called Bias-Variance Reduced Local SGD (BVR-L-SGD) for nonconvex federated learning. One of the novelties of this paper is in the analysis of our bias and variance reduced local gradient estimators which fully utilize small second-order heterogeneity of local objectives and suggests to randomly pick up one of the local models instead of taking average of them when workers are synchronized. Under small heterogeneity of local objectives, we show that our methods achieve smaller communication complexity than both the previous non-local and local methods for general nonconvex objectives. Furthermore, we also compare the total execution time, that is the sum of total communication time and total computational time per worker, and show the superiority of our methods to the existing methods when the heterogeneity is small and single communication time is more time consuming than single stochastic gradient computation. Numerical results are provided to verify our theoretical findings and give empirical evidence of the superiority of our algorithms.
翻译:联邦学习是分布式学习中重要的学习情景之一,我们的目标是在通信和计算成本方面有效地学习多种多样的本地数据集。在本文中,我们研究新的本地算法,称为“Bias-Variance 递减本地 SGD(BVR-L-SGD) ”,用于非convex联合学习。本文的新颖之处之一是分析我们的偏差和差异性,即减少本地梯度测算器,这些测算器充分利用了本地目标的小规模二级差异性,建议随机取取一个本地模型,而不是在工人同步时取用平均模型。在本地目标的细小异质性下,我们发现我们的方法的通信复杂性小于以往非本地和地方通用的非本地和一般非本地方法。此外,我们还比较了总执行时间,即总通信时间和每个工人总计算时间的总和,并表明我们的方法优于现有方法,当异质性小和单一通信时间比单一梯度计算结果消耗的时间比单一梯度计算结果要多时。我们的经验性分析结果,以便核实我们的理论性分析结果。