This study investigates clustered federated learning (FL), one of the formulations of FL with non-i.i.d. data, where the devices are partitioned into clusters and each cluster optimally fits its data with a localized model. We propose a clustered FL framework that incorporates a nonconvex penalty to pairwise differences of parameters. This framework can automatically identify cluster structures without a priori knowledge of the number of clusters and the set of devices in each cluster. To implement the proposed framework, we introduce a novel clustered FL method called Fusion Penalized Federated Clustering (FPFC). Building upon the standard alternating direction method of multipliers (ADMM), FPFC is implemented in parallel, updates only a subset of devices at each communication round, and allows for variable workload per device. These strategies significantly reduce the communication cost while ensuring privacy, making it practical for FL. We also propose a new warmup strategy for hyperparameter tuning in FL settings and explore the asynchronous variant of FPFC (asyncFPFC). Theoretical analysis provides convergence guarantees for FPFC with general nonconvex losses and establishes the statistical convergence rate under a linear model with squared loss. Extensive experiments demonstrate the advantages of FPFC over existing methods, including robustness and generalization capability.
翻译:暂无翻译