Federated learning (FL) is a technique for distributed machine learning (ML), in which edge devices carry out local model training on their individual datasets. In traditional FL algorithms, trained models at the edge are periodically sent to a central server for aggregation, utilizing a star topology as the underlying communication graph. However, assuming access to a central coordinator is not always practical, e.g., in ad hoc wireless network settings. In this paper, we develop a novel methodology for fully decentralized FL, where in addition to local training, devices conduct model aggregation via cooperative consensus formation with their one-hop neighbors over the decentralized underlying physical network. We further eliminate the need for a timing coordinator by introducing asynchronous, event-triggered communications among the devices. In doing so, to account for the inherent resource heterogeneity challenges in FL, we define personalized communication triggering conditions at each device that weigh the change in local model parameters against the available local resources. We theoretically demonstrate that our methodology converges to the globally optimal learning model at a $O{(\frac{\ln{k}}{\sqrt{k}})}$ rate under standard assumptions in distributed learning and consensus literature. Our subsequent numerical evaluations demonstrate that our methodology obtains substantial improvements in convergence speed and/or communication savings compared with existing decentralized FL baselines.
翻译:联邦学习(FL)是一种分布式机器学习(ML)技术,在这种技术中,边缘装置在个人数据集方面进行当地示范培训。在传统的FL算法中,边上经过训练的模型定期被送到中央服务器汇总,使用恒星地形学作为基本通信图。然而,假定可以与中央协调员接触并不总是切实可行的,例如在临时的无线网络设置中。在本文中,我们为完全分散式的FL开发了一种新的方法,除了地方培训之外,装置通过在分散式基本物理网络上与独居邻居合作形成共识,进行模型汇总。我们进一步消除了对时间协调员的需求,方法是在装置之间引入无同步、事件触发的通信。在这样做时,考虑到FL内在的资源差异性挑战,我们界定了每个装置的个人化通信触发条件,以衡量当地模型参数的变化与现有的当地资源。我们理论上证明我们的方法与全球最佳学习模式模式模式在$O{(fraxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx