In Federated Learning (FL), the clients learn a single global model (FedAvg) through a central aggregator. In this setting, the non-IID distribution of the data across clients restricts the global FL model from delivering good performance on the local data of each client. Personalized FL aims to address this problem by finding a personalized model for each client. Recent works widely report the average personalized model accuracy on a particular data split of a dataset to evaluate the effectiveness of their methods. However, considering the multitude of personalization approaches proposed, it is critical to study the per-user personalized accuracy and the accuracy improvements among users with an equitable notion of fairness. To address these issues, we present a set of performance and fairness metrics intending to assess the quality of personalized FL methods. We apply these metrics to four recently proposed personalized FL methods, PersFL, FedPer, pFedMe, and Per-FedAvg, on three different data splits of the CIFAR-10 dataset. Our evaluations show that the personalized model with the highest average accuracy across users may not necessarily be the fairest. Our code is available at https://tinyurl.com/1hp9ywfa for public use.
翻译:在联邦学习(FL)中,客户通过中央聚合器学习单一全球模型(FedAvg),客户通过中央聚合器学习单一全球模型(FedAvg),在这一背景下,客户之间数据的非IID分布使全球FL模型无法在每个客户的当地数据上取得良好业绩。个性化FL旨在通过为每个客户寻找个性化模型来解决这一问题。最近的工作广泛报道了一个数据集中一个特定数据集中个人化平均模型精确度,以评价其方法的有效性。然而,考虑到所提议的个人化方法繁多,至关重要的是要研究用户个人化的准确度和用户间对准确度的改进,要有一个公平的公平概念。为了解决这些问题,我们提出了一套业绩和公平性衡量标准,旨在评估个性化FL方法的质量。我们将这些指标应用于最近提出的个人化FL方法、PersFL、FedPer、PFedMe、PFedMe和Per-FedAvg的四种个人化方法,以评价其方法的三种不同的数据分割。我们的评价显示,用户之间平均准确度最高的个人化模型不一定用于http://httpsfurfrfrfrforfa。