Federated Learning (FL) has emerged as a de facto machine learning area and received rapid increasing research interests from the community. However, catastrophic forgetting caused by data heterogeneity and partial participation poses distinctive challenges for FL, which are detrimental to the performance. To tackle the problems, we propose a new FL approach (namely GradMA), which takes inspiration from continual learning to simultaneously correct the server-side and worker-side update directions as well as take full advantage of server's rich computing and memory resources. Furthermore, we elaborate a memory reduction strategy to enable GradMA to accommodate FL with a large scale of workers. We then analyze convergence of GradMA theoretically under the smooth non-convex setting and show that its convergence rate achieves a linear speed up w.r.t the increasing number of sampled active workers. At last, our extensive experiments on various image classification tasks show that GradMA achieves significant performance gains in accuracy and communication efficiency compared to SOTA baselines.
翻译:联邦学习联盟(FL)已经成为一个事实上的机器学习领域,并得到了社区迅速增加的研究兴趣。然而,数据差异和部分参与造成的灾难性遗忘给FL带来了独特的挑战,不利于业绩。为了解决这些问题,我们提出了一个新的FL办法(即GradMA),从不断学习的启发中汲取灵感,以同时纠正服务器和工人之间的更新方向,并充分利用服务器丰富的计算和记忆资源。此外,我们制定了一个减少记忆战略,使GradMA能够容纳大量工人的FL。然后,我们从理论上分析GradMA在平滑的无螺旋装置下的趋同,并表明其趋同率可以直线加快抽样活跃工人数量的增加。最后,我们对各种图像分类任务的广泛实验表明,与SOTA基线相比,GradMA在准确性和通信效率方面取得了显著的绩效提高。</s>