Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.
翻译:联邦学习(FL)是一种分布式机器学习范例,客户端协作地使用其本地(人工生成的)数据集训练模型。虽然现有的研究侧重于FL算法的开发,以解决客户端之间的数据异构性,但FL中数据质量(例如,标签噪声)的重要问题被忽视了。本文旨在通过提供关于标签噪声对FL影响的定量研究来填补这一空白。我们推导出一个上限,它是客户端标签噪声水平的线性函数,用于一般化误差。然后,我们使用各种FL算法对MNIST和CIFAR-10数据集进行实验。我们的实证结果表明,全局模型精度随噪声水平的增加而线性下降,这与我们的理论分析一致。我们进一步发现,在噪声水平较高时,标签噪声会减慢FL训练的收敛速度,并且全局模型往往会过拟合。