Federated Learning is an emerging learning paradigm that allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. Despite its success, federated learning faces several challenges related to its decentralized nature. In this work, we develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles, namely (i) data heterogeneity, i.e., data distributions can vary substantially across clients, and (ii) system heterogeneity, i.e., the computational power of the clients could differ significantly. Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client. Furthermore, our method mitigates the effects of stragglers by adaptively selecting clients based on their computational characteristics and statistical significance, thus achieving, for the first time, near optimal sample complexity and provable logarithmic speedup. Experimental results support our theoretical findings showing the superiority of our method over alternative personalized federated schemes in system and data heterogeneous environments.
翻译:联邦学习是一种新兴的学习模式,它允许在大型客户网络中分发样本的培训模式,同时尊重隐私和通信限制。尽管它取得了成功,但联邦学习面临着与其分散性质有关的若干挑战。在这项工作中,我们开发了一个具有理论加速保障的新算法程序,同时处理其中两个障碍,即:(一) 数据异质性,即数据分布在客户之间可能有很大差异,(二) 系统差异性,即客户的计算能力可能大不相同。我们的方法依靠代表性学习理论的理念,利用所有客户的数据找到全球共同代表,并学习一套用户特有的参数,导致每个客户的个人化解决方案。此外,我们的方法通过根据客户的计算特点和统计意义,通过适应性选择客户来减轻其影响,从而首次实现接近最佳的抽样复杂性和可调控的对数速度。实验结果支持我们的理论发现,显示我们的方法优于系统和数据混合环境中的替代个人化进化计划。