When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.
翻译:当现有硬件无法满足记忆和计算有效培训高性能机器学习模式的要求时,就需要在培训质量或模型复杂程度方面达成妥协。在联邦学习(FL)中,节点是比传统服务器级硬件更受约束的数量级,而且往往是电池充电,严重限制了可在此模式下培训的模型的精密性。虽然大多数研究侧重于设计更好的综合战略,以提高趋同率和降低FL的通信成本,但用于加快在线培训的努力却较少。这一阶段重复数百次(即每轮),可能涉及数千个设备。在联邦学习(FL)中,节点是比传统服务器级硬件更受限制的数量级更强,而且往往是电池动力化的。在这项工作中,我们提出了第一份研究,涉及在FL工作量培训时间引入偏僻性时出现的独特问题。我们随后提出了ZeroFLFL框架,这个框架依靠高度稀少的操作来加速在线培训。这个阶段,重复了数百次(即每轮),可能涉及数千个设备,占培训所需时间的多数,是培训时间的多数时间,而客户方是培训所需能源消耗总量为2.3%。我们提出了从stirnest-L培训基准框架从州调整到Fstestrofortrofort-trat-st-st-st-stanl-fort-forturn-forturn-forturn-fort-fortal-formation-fort-st-fortal-fortil-fort-fortilt-formationformil)所需的最高基准框架到Fs