Malicious clients can attack federated learning systems using malicious data, including backdoor samples, during the training phase. The compromised global model will perform well on the validation dataset designed for the task, but a small subset of data with backdoor patterns may trigger the model to make a wrong prediction. There has been an arms race between attackers who tried to conceal attacks and defenders who tried to detect attacks during the aggregation stage of training on the server-side. In this work, we propose a new and effective method to mitigate backdoor attacks after the training phase. Specifically, we design a federated pruning method to remove redundant neurons in the network and then adjust the model's extreme weight values. Our experiments conducted on distributed Fashion-MNIST show that our method can reduce the average attack success rate from 99.7% to 1.9% with a 5.5% loss of test accuracy on the validation dataset. To minimize the pruning influence on test accuracy, we can fine-tune after pruning, and the attack success rate drops to 6.4%, with only a 1.7% loss of test accuracy. Further experiments under Distributed Backdoor Attacks on CIFAR-10 also show promising results that the average attack success rate drops more than 70% with less than 2% loss of test accuracy on the validation dataset.
翻译:恶意客户可以在培训阶段使用恶意数据攻击联盟学习系统, 包括后门样本。 失密的全球模型将在为任务设计的验证数据集上运行良好, 但有后门模式的一小部分数据可能会触发错误预测模型。 试图隐藏攻击袭击者与试图在服务器一侧培训汇总阶段发现攻击的维权者之间发生了军备竞赛。 在这项工作中, 我们提出一种新的有效方法, 以在培训阶段之后减少后门攻击。 具体地说, 我们设计了一个联合的剪裁方法, 以清除网络中的多余神经元, 然后调整模型的极端重量值。 我们在分布式的时装- MNIST 进行的实验显示, 我们的方法可以将平均攻击成功率从99.7%降低到1.9%, 验证数据集的测试准确度损失率为5.5%。 为了尽量减少对测试准确性的影响, 我们可以在运行后进行微调, 攻击成功率下降到6.4%, 测试准确性只有1.7%的损失。 在分布式后门攻击中进行进一步实验, CIFAR- 10 测试中的平均成功率为2, 测试成功率比平均成功率低。