Federated learning (FL) is a promising approach for training decentralized data located on local client devices while improving efficiency and privacy. However, the distribution and quantity of the training data on the clients' side may lead to significant challenges such as class imbalance and non-IID (non-independent and identically distributed) data, which could greatly impact the performance of the common model. While much effort has been devoted to helping FL models converge when encountering non-IID data, the imbalance issue has not been sufficiently addressed. In particular, as FL training is executed by exchanging gradients in an encrypted form, the training data is not completely observable to either clients or servers, and previous methods for class imbalance do not perform well for FL. Therefore, it is crucial to design new methods for detecting class imbalance in FL and mitigating its impact. In this work, we propose a monitoring scheme that can infer the composition of training data for each FL round, and design a new loss function -- \textbf{Ratio Loss} to mitigate the impact of the imbalance. Our experiments demonstrate the importance of acknowledging class imbalance and taking measures as early as possible in FL training, and the effectiveness of our method in mitigating the impact. Our method is shown to significantly outperform previous methods, while maintaining client privacy.
翻译:联邦学习(FL)是培训当地客户设备上分散的数据的一个很有希望的方法,同时提高效率和隐私。但是,客户方培训数据的分布和数量可能带来诸如阶级不平衡和非IID(不独立和同样分布)数据等重大挑战,这可能会大大影响共同模式的性能。虽然已作出很大努力帮助FL模型在遇到非IID数据时汇集,但不平衡问题没有得到充分解决。特别是,FL培训是通过以加密形式交换梯度来进行,培训数据对客户或服务器都没有完全可见,以往的班级不平衡方法对FL没有很好地表现。因此,必须设计新的方法来发现FL中的班级不平衡并减轻其影响。在这项工作中,我们提出了一个监测计划,可以推断每一轮FL培训数据的组成,并设计一个新的损失功能 -- -- textbf{Ratio Lost},以减轻这种不平衡的影响。我们的实验表明承认班级不平衡的重要性,并尽早采取措施减少班级不平衡现象对FL公司的影响,同时显示我们的客户方方法具有显著的影响。