Federated learning (FL) is a distributed framework for collaboratively training with privacy guarantees. In real-world scenarios, clients may have Non-IID data (local class imbalance) with poor annotation quality (label noise). The co-existence of label noise and class imbalance in FL's small local datasets renders conventional FL methods and noisy-label learning methods both ineffective. To address the challenges, we propose FedCNI without using an additional clean proxy dataset. It includes a noise-resilient local solver and a robust global aggregator. For the local solver, we design a more robust prototypical noise detector to distinguish noisy samples. Further to reduce the negative impact brought by the noisy samples, we devise a curriculum pseudo labeling method and a denoise Mixup training strategy. For the global aggregator, we propose a switching re-weighted aggregation method tailored to different learning periods. Extensive experiments demonstrate our method can substantially outperform state-of-the-art solutions in mix-heterogeneous FL environments.
翻译:Federated learning(FL)是一种具有隐私保证的分布式框架,用于协作训练。在实际场景中,客户端可能具有质量较差的注释(标签噪声)的非IID数据(本地类别失衡)。FL的小型本地数据集中标签噪声和类别失衡的共存使得传统的FL方法和噪声标签学习方法都无效。为了解决这些挑战,我们提出了FedCNI,不使用额外的干净代理数据集。它包括一个耐噪声本地求解器和一个强健的全局聚合器。对于本地求解器,我们设计了一个更强健的原型噪声检测器以区分嘈杂样本。进一步减少嘈杂样本带来的负面影响,我们设计了一种曲线伪标签方法和去噪混合训练策略。对于全局聚合器,我们提出了一种适合不同学习时期的交换重新加权聚合方法。广泛的实验表明,我们的方法可以在混合异构FL环境中明显优于最先进的解决方案。