Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices. Hence, the labels in practice are usually annotated by clients of varying expertise or criteria and thus contain different amounts of noises. Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the global model through aggregation. Although recent robust FL methods take malicious clients into account, they have not addressed local noisy labels on each device and the impact to the global model. In this paper, we develop a simple two-level sampling method "FedNoiL" that (1) selects clients for more robust global aggregation on the server; and (2) selects clean labels and correct pseudo-labels at the client end for more robust local training. The sampling probabilities are built upon clean label detection by the global model. Moreover, we investigate different schedules changing the local epochs between aggregations over the course of FL, which notably improves the communication and computation efficiency in noisy label setting. In experiments with homogeneous/heterogeneous data distributions and noise ratios, we observed that direct combinations of SOTA FL methods with SOTA noisy-label learning methods can easily fail but our method consistently achieves better and robust performance.
翻译:联邦学习(FL)旨在培训服务器方面的全球模型,而培训数据则是收集和定位于当地设备。因此,在实践中,标签通常由具有不同专长或标准的客户附加说明,因而含有不同数量的噪音。关于噪音标签的地方培训很容易导致过于适应噪音标签,而通过聚合对全球模型造成破坏。虽然最近稳健的FL方法考虑到恶意客户,但没有解决每个装置的当地噪音标签以及对全球模型的影响。在本文中,我们开发了一个简单的两级抽样方法“FedNoL”,其中(1) 选择客户在服务器上进行更强有力的全球聚合;(2) 在客户端选择清洁标签和正确的假标签,以进行更强有力的当地培训。抽样概率是建立在全球模型对清洁标签的检测基础上的。此外,我们调查了不同的时间表,改变了每个装置的组合之间的局部杂交点,这明显改善了调控点的通信和计算效率。在对同质/电子遗传数据分布和噪音比率进行实验时,我们观察了在客户端选择清洁标签和正确的伪标签,以更稳健的地方培训。我们观察到了SOTA-DA方法能够以更稳健的同步的方式实现SO-TA-TA-DA的合成方法。