Traditional federated optimization methods perform poorly with heterogeneous data (ie, accuracy reduction), especially for highly skewed data. In this paper, we investigate the label distribution skew in FL, where the distribution of labels varies across clients. First, we investigate the label distribution skew from a statistical view. We demonstrate both theoretically and empirically that previous methods based on softmax cross-entropy are not suitable, which can result in local models heavily overfitting to minority classes and missing classes. Additionally, we theoretically introduce a deviation bound to measure the deviation of the gradient after local update. At last, we propose FedLC (\textbf {Fed} erated learning via\textbf {L} ogits\textbf {C} alibration), which calibrates the logits before softmax cross-entropy according to the probability of occurrence of each class. FedLC applies a fine-grained calibrated cross-entropy loss to local update by adding a pairwise label margin. Extensive experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model and much improved performance. Furthermore, integrating other FL methods into our approach can further enhance the performance of the global model.
翻译:传统的联邦优化方法在差异化数据( i, 精确度降低) 方面表现不佳, 特别是对于高度偏斜的数据。 在本文中, 我们调查FL的标签分布斑点。 首先, 我们从统计角度来调查标签分布斑点。 我们从理论上和实践中都表明, 以往基于软麦克斯交叉食谱的方法不合适, 这可能导致本地模型严重过度适应少数民族阶级和缺失的类别。 此外, 我们理论上引入偏差, 以测量本地更新后梯度的偏差 。 我们最后提议 FedLC (\ textbf {Fed} elade elade learning by textbf {L} ogits\ textbf {C} 校准), 以软麦交叉食谱为根据的概率校正。 FLC 将精密校准的跨作物损失应用于本地更新, 添加对称标签比值 。 广泛实验了 federgerated 数据集和真实世界的数据集, 显示 FLC 将进一步提升全球性能 。