Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.
翻译:合作学习和外包数据收集中的恶意代理人威胁到清洁模型的培训。 幕后攻击是一种攻击者在培训期间毒害一种模型,以成功实现定向分类错误,这是培训时间稳健性的一个主要问题。 在本文中,我们调查了一种多剂幕后攻击情景,其中多个攻击者试图同时将受害者模型拒之门外。在广泛的游戏中观察到一种一贯的反火现象,其中代理者遭受了低集体攻击成功率。我们研究了后门攻击配置、不合作/合作、联合分发转移和游戏设置的不同模式,以便在较低约束下返回均衡攻击成功率。结果促使对实际环境的后门防御研究进行重新评价。