In federated learning (FL), robust aggregation schemes have been developed to protect against malicious clients. Many robust aggregation schemes rely on certain numbers of benign clients being present in a quorum of workers. This can be hard to guarantee when clients can join at will, or join based on factors such as idle system status, and connected to power and WiFi. We tackle the scenario of securing FL systems conducting adversarial training when a quorum of workers could be completely malicious. We model an attacker who poisons the model to insert a weakness into the adversarial training such that the model displays apparent adversarial robustness, while the attacker can exploit the inserted weakness to bypass the adversarial training and force the model to misclassify adversarial examples. We use abstract interpretation techniques to detect such stealthy attacks and block the corrupted model updates. We show that this defence can preserve adversarial robustness even against an adaptive attacker.
翻译:在联合学习(FL)中,已经制定了强有力的汇总计划来防止恶意客户。许多强有力的汇总计划依靠一定数量的良性客户在员工的法定人数中存在。当客户可以随意加入或根据闲置系统状态等因素加入,并与电力和无线网络连接时,这可能很难保证这一点。我们解决了在员工的法定人数完全恶意时确保FL系统进行对抗性培训的情景。我们模拟了一名袭击者,他毒害了模型,在对称培训中插入了一个弱点,使模型显示明显的对抗性强势,而攻击者可以利用所插入的弱点绕过对抗性培训,迫使模型错误地分类对抗性实例。我们使用抽象的解释技术来侦查此类隐形袭击,并阻止腐败的模式更新。我们表明,即使针对适应性攻击者,这种防御也能保持对抗性强势。