There is a growing interest in the distributed optimization framework that goes under the name of Federated Learning (FL). In particular, much attention is being turned to FL scenarios where the network is strongly heterogeneous in terms of communication resources (e.g., bandwidth) and data distribution. In these cases, communication between local machines (agents) and the central server (Master) is a main consideration. In this work, we present SHED, an original communication-constrained Newton-type (NT) algorithm designed to accelerate FL in such heterogeneous scenarios. SHED is by design robust to non i.i.d. data distributions, handles heterogeneity of agents' communication resources (CRs), only requires sporadic Hessian computations, and achieves super-linear convergence. This is possible thanks to an incremental strategy, based on eigendecomposition of the local Hessian matrices, which exploits (possibly) outdated second-order information. The proposed solution is thoroughly validated on real datasets by assessing (i) the number of communication rounds required for convergence, (ii) the overall amount of data transmitted and (iii) the number of local Hessian computations. For all these metrics, the proposed approach shows superior performance against state-of-the art techniques like GIANT and FedNL.
翻译:以联邦学习联合会(FL)的名义对分布式优化框架越来越感兴趣。特别是,人们正在把许多注意力转向网络在通信资源(例如带宽)和数据分布方面差异很大的FL情景,在这些情况下,当地机器(代理人)和中央服务器(Master)之间的通信是一个主要考虑因素。在这项工作中,我们介绍了SHED,这是一个最初的通信限制型牛顿(NT)算法,目的是在这种复杂情景中加快FL。SHED设计得有力,使之与非i.d.数据分布相适应,处理代理人通信资源(CRs)的异性,只需要零星的Hessian计算,并实现超级线性趋同。这有可能归功于基于当地Hessian矩阵的递增战略,它利用(可能)过时的二阶级信息。提议的解决方案在真实数据集上得到了彻底验证,方法是评估(一)为趋同而需要的通信轮数,(二)用于合并的代理人通信资源(CRs),只需要零星的计算,而只需要零星的Hissian计算,而实现超线式的计算,并实现超线性趋同线性一致。这可能是因为基于对当地Hissian矩阵矩阵矩阵矩阵的计算方法的计算,并展示了拟议的当地数据总级计算。