Federated Learning (FL) has been gaining significant traction across different ML tasks, ranging from vision to keyboard predictions. In large-scale deployments, client heterogeneity is a fact, and constitutes a primary problem for fairness, training performance and accuracy. Although significant efforts have been made into tackling statistical data heterogeneity, the diversity in the processing capabilities and network bandwidth of clients, termed as system heterogeneity, has remained largely unexplored. Current solutions either disregard a large portion of available devices or set a uniform limit on the model's capacity, restricted by the least capable participants. In this work, we introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks and enables the extraction of lower footprint submodels without the need of retraining. We further show that for linear maps our Ordered Dropout is equivalent to SVD. We employ this technique, along with a self-distillation methodology, in the realm of FL in a framework called FjORD. FjORD alleviates the problem of client system heterogeneity by tailoring the model width to the client's capabilities. Extensive evaluation on both CNNs and RNNs across diverse modalities shows that FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
翻译:联邦学习联盟(FL)在从愿景到键盘预测等不同 ML任务中获得了显著的牵引。在大规模部署中,客户差异是一个事实,是公平、培训绩效和准确性的主要问题。尽管在解决统计数据差异性、培训绩效和准确性方面做出了重大努力,但客户的处理能力和网络带宽(称为系统差异性)仍然在很大程度上没有被探索。目前的解决办法要么忽视了很大一部分可用的设备,要么对模型能力最差的参与者限制的模型能力设定了统一限值。在这项工作中,我们引入了有条不紊的辍学机制,该机制在神经网络中实现有条不紊的、嵌套的知识代表,并使得无需再培训就能提取较低的足迹子模型。我们进一步表明,在线性地图中,我们的定置脱落能力与SVD相当。我们在FjORD的框架中采用这种技术,同时采用自我淡化方法,在FjORD的框架中,缓解客户系统问题,同时通过不断调整FOR-NF的基线能力,使F-R-NAR的客户在不断获得显著的基线。