Federated Learning (FL) has been gaining significant traction across different ML tasks, ranging from vision to keyboard predictions. In large-scale deployments, client heterogeneity is a fact and constitutes a primary problem for fairness, training performance and accuracy. Although significant efforts have been made into tackling statistical data heterogeneity, the diversity in the processing capabilities and network bandwidth of clients, termed as system heterogeneity, has remained largely unexplored. Current solutions either disregard a large portion of available devices or set a uniform limit on the model's capacity, restricted by the least capable participants. In this work, we introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in deep neural networks (DNNs) and enables the extraction of lower footprint submodels without the need of retraining. We further show that for linear maps our Ordered Dropout is equivalent to SVD. We employ this technique, along with a self-distillation methodology, in the realm of FL in a framework called FjORD. FjORD alleviates the problem of client system heterogeneity by tailoring the model width to the client's capabilities. Extensive evaluation on both CNNs and RNNs across diverse modalities shows that FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
翻译:联邦学习联盟(FL)在从愿景到键盘预测等不同 ML任务中获得了显著的牵引。在大规模部署中,客户差异是一个事实,对公平、培训业绩和准确性构成主要问题。尽管在解决统计数据差异性、培训绩效和准确性方面做出了重大努力,但客户处理能力的多样性和网络带宽(称为系统差异性)仍然基本上没有被探索。目前的解决办法要么忽视了很大一部分可用的设备,要么对模型能力最差的参与者限制的模型能力设定了统一限值。在这项工作中,我们引入了有条不紊的退出机制,该机制在深神经网络(DNNN)中实现了有条不紊的、嵌套的知识代表,并使得无需再培训就能够提取较低的足迹子模型。我们进一步表明,在线性图中,我们定置的下降能力相当于SVD。我们在FjORD的框架中,在FL领域使用这种技术,同时采用自我消化方法。FjORD缓解了客户系统的问题,同时通过将客户系统的广度定位能力显示FOR的客户的广度,从而显示FOR的系统获得显著性。