Federated learning (FL) has attracted increasing attention as a promising approach to driving a vast number of end devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of FL considering the unreliable nature of end devices while the cost of device-server communication cannot be neglected. In this paper, we propose SAFA, a semi-asynchronous FL protocol, to address the problems in federated learning such as low round efficiency and poor convergence rate in extreme conditions (e.g., clients dropping offline frequently). We introduce novel designs in the steps of model distribution, client selection and global aggregation to mitigate the impacts of stragglers, crashes and model staleness in order to boost efficiency and improve the quality of the global model. We have conducted extensive experiments with typical machine learning tasks. The results demonstrate that the proposed protocol is effective in terms of shortening federated round duration, reducing local resource wastage, and improving the accuracy of the global model at an acceptable communication cost.
翻译:联邦学习(FL)作为一种有希望的方法,吸引了越来越多的注意力,将大量终端装置和人工智能作为一种有希望的方法,然而,考虑到终端装置的不可靠性质,而设备-服务器的通信费用不容忽视,保证FL的效率是极具挑战性的;在本文件中,我们提议FSAA(半非同步的FL协议)解决联邦学习问题,如极条件下的双向效率低和汇合率差(例如客户经常下线),我们在模型分发、客户选择和全球汇总等步骤中采用新的设计,以减轻蒸发器、撞车和模型残渣的影响,以提高全球模型的效率和质量;我们进行了广泛的实验,执行典型的机器学习任务;结果显示,拟议的议定书在缩短交配电周期、减少当地资源浪费和提高全球模型的准确性方面是有效的,以可接受的通信费用提高全球模型的准确性。