Federated Learning (FL) enables collaborative deep learning training across multiple participants without exposing sensitive personal data. However, the distributed nature of FL and the unvetted participants' data makes it vulnerable to backdoor attacks. In these attacks, adversaries inject malicious functionality into the centralized model during training, leading to intentional misclassifications for specific adversary-chosen inputs. While previous research has demonstrated successful injections of persistent backdoors in FL, the persistence also poses a challenge, as their existence in the centralized model can prompt the central aggregation server to take preventive measures to penalize the adversaries. Therefore, this paper proposes a methodology that enables adversaries to effectively remove backdoors from the centralized model upon achieving their objectives or upon suspicion of possible detection. The proposed approach extends the concept of machine unlearning and presents strategies to preserve the performance of the centralized model and simultaneously prevent over-unlearning of information unrelated to backdoor patterns, making the adversaries stealthy while removing backdoors. To the best of our knowledge, this is the first work that explores machine unlearning in FL to remove backdoors to the benefit of adversaries. Exhaustive evaluation considering image classification scenarios demonstrates the efficacy of the proposed method in efficient backdoor removal from the centralized model, injected by state-of-the-art attacks across multiple configurations.
翻译:联邦学习(FL)能够在多个参与者之间进行协作深度学习训练,同时不暴露敏感的个人数据。然而,FL的分布式特性和未经审核的参与者数据使其容易受到后门攻击的威胁。在这些攻击中,对手在训练期间向集中式模型中注入恶意功能,导致特定对手选择的输入被故意错误分类。尽管以前的研究已经成功地在FL中注入了持久的后门,但其持久性也带来了挑战,因为它们存在于集中式模型中会促使中央聚合服务器采取预防措施来惩罚对手。因此,本文提出了一种方法,使对手能够在实现其目标或怀疑可能被检测到时有效地从集中式模型中删除后门。所提出的方法延伸了机器取消学习的概念,并提出了保持集中式模型性能的策略,同时防止过度取消与后门模式无关的信息,使对手在删除后门时更具隐蔽性。据我们所知,这是第一篇探讨在FL中应用机器取消学习来消除后门的论文,有助于对手。在考虑图像分类场景的详尽评估中,证明了所提出的方法在多个配置下从被最先进的攻击注入的集中式模型中高效地消除后门的有效性。