Federated Learning (FL) has become an emerging norm for distributed model training, which enables multiple devices cooperatively to train a shared model utilizing their own datasets scheduled by a central server while keeping private data localized. However, during the training process, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and frequent communication across participants may significantly influence the training performance, slow down the convergent rate, and increase communication consumption. In this paper, we ameliorate the standard stochastic gradient descent approach by introducing the aggregated gradients at each local update epoch and propose an adaptive learning rate iterative algorithm that further takes the deviation between the local parameter and global parameter into account. The aforementioned adaptive learning rate design mechanism requires local information of all clients, which is challenging as there is no communication during the local update epochs. To obtain a decentralized adaptive learning rate for each client, we introduce the mean-field approach by utilizing two mean-field terms to estimate the average local parameters and gradients respectively without exchanging clients' local information with each other over time. Through theoretical analysis, we prove that our method can provide the convergence guarantee for model training and derive a convergent upper bound for the client drifting term. Extensive numerical results show that our proposed framework is superior to the state-of-the-art FL schemes in both model accuracy and convergent rate on real-world datasets with IID and Non-IID data distribution.
翻译:暂无翻译