The goal of federated learning (FL) is to train one global model by aggregating model parameters updated independently on edge devices without accessing users' private data. However, FL is susceptible to backdoor attacks where a small fraction of malicious agents inject a targeted misclassification behavior in the global model by uploading polluted model updates to the server. In this work, we propose DifFense, an automated defense framework to protect an FL system from backdoor attacks by leveraging differential testing and two-step MAD outlier detection, without requiring any previous knowledge of attack scenarios or direct access to local model parameters. We empirically show that our detection method prevents a various number of potential attackers while consistently achieving the convergence of the global model comparable to that trained under federated averaging (FedAvg). We further corroborate the effectiveness and generalizability of our method against prior defense techniques, such as Multi-Krum and coordinate-wise median aggregation. Our detection method reduces the average backdoor accuracy of the global model to below 4% and achieves a false negative rate of zero.
翻译:联合学习(FL)的目标是通过将在边缘设备上独立更新的模型参数汇集在一起,不需查阅用户的私人数据,来培训一个全球模型。然而,FL很容易受到幕后攻击,因为一小部分恶意物剂通过将污染的模型更新内容上传到服务器,在全球模型中注入了有针对性的错误分类行为。在这项工作中,我们提议DifFense(DifFense),一个自动防御框架,通过利用差异测试和两步MAD异常值检测,保护FL系统免遭后门攻击,而无需事先知道攻击情景或直接接触当地模型参数。我们的经验显示,我们的探测方法可以防止各种潜在攻击者,同时不断实现全球模型的趋同,与在联邦平均水平(FedAvg)下所培训的模型相类似。我们进一步证实了我们的方法相对于以前的防御技术,例如多克伦和协调一致式中位组合的有效性和可概括性。我们的探测方法将全球模型的平均后门准确度降低到4%以下,并实现零度的虚假负率。