Federated learning (FL) provides autonomy and privacy by design to participating peers, who cooperatively build a machine learning (ML) model while keeping their private data in their devices. However, that same autonomy opens the door for malicious peers to poison the model by conducting either untargeted or targeted poisoning attacks. The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples from one class (i.e., the source class) to another (i.e., the target class). Unfortunately, this attack is easy to perform and hard to detect and it negatively impacts on the performance of the global model. Existing defenses against LF are limited by assumptions on the distribution of the peers' data and/or do not perform well with high-dimensional models. In this paper, we deeply investigate the LF attack behavior and find that the contradicting objectives of attackers and honest peers on the source class examples are reflected in the parameter gradients corresponding to the neurons of the source and target classes in the output layer, making those gradients good discriminative features for the attack detection. Accordingly, we propose a novel defense that first dynamically extracts those gradients from the peers' local updates, and then clusters the extracted gradients, analyzes the resulting clusters and filters out potential bad updates before model aggregation. Extensive empirical analysis on three data sets shows the proposed defense's effectiveness against the LF attack regardless of the data distribution or model dimensionality. Also, the proposed defense outperforms several state-of-the-art defenses by offering lower test error, higher overall accuracy, higher source class accuracy, lower attack success rate, and higher stability of the source class accuracy.
翻译:联邦学习( FL) 通过设计为参与的同龄人提供自主和隐私,他们合作建立机器学习(ML)模型,同时将其私人数据保存在设备中。然而,同样的自主性为恶意同龄人打开了通过非定向或定向中毒袭击毒害模型的大门。 标签疏通(LF)袭击是一种有针对性的中毒袭击,袭击者通过将某个类别(即源类)中某些例子的标签翻转至另一个类别(即目标类)来毒害他们的培训数据。 不幸的是,这次袭击很容易进行,很难检测,而且对全球模型的性能产生了负面影响。 现有的针对LF的防御因对同龄人数据分布的假设和/或对高维模式的破坏而受到限制。 在本文中,我们深入调查了LF攻击行为的行为,发现攻击者和诚实同龄人之间在来源(即源类)和目标类中的某些例子中,也反映了与源的神经和目标类(即目标类)相对的参数梯度梯度梯度,使得这些较高的歧视性特征对攻击性攻击率测试全球模型的性测试效果。 因此,我们提议了一个新的Silfrefrofrodestry 测试了数的基数据序列中,然后从数级数据流分析了源中获取了数的基数 。