Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting and erasing backdoor triggers, it is still not clear if measures can be taken to avoid the triggers from being learned into the model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, of which the aim is to train clean models on backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the clean portion of data and learning the backdoor portion of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data at a much faster rate than learning clean data, and the stronger the attack the faster the model converges on backdoored data; and 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism into standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \underline{https://github.com/bboylyg/ABL}.
翻译:后门攻击已成为对深层神经网络( DNNS ) 的重大安全威胁。 虽然现有的防御方法在发现和消除后门触发器方面已经展示出有希望的结果, 但目前还不清楚是否能采取措施避免触发器在最初的模型中学习。 在本文中, 我们引入了“ emph{ anti- 后门学习” 的概念, 其目的是在后门污染数据中培训清洁模型。 我们把整个学习过程设置为学习数据清洁部分和学习后门数据部分的双重任务。 从这个角度看, 我们发现后门攻击的两个内在特征是它们的弱点:1 模型学习后门数据的速度比学习清洁数据要快得多, 而模型在后门数据集中的速度越快; 2 后门任务与特定的类( 后门目标类) 有关。 根据这两个弱点, 我们提议了一个一般学习计划, 反Backcom学习( ABL), 在培训期间自动防止后门攻击。 ABLL 向后门实验阶段引入了一种双向的实验性数据, 在标准级培训阶段, 在B级/ 级的后门训练中, 在10级的后训练中, 向后训练中显示一个标准级 。