DNNs' demand for massive data forces practitioners to collect data from the Internet without careful check due to the unacceptable cost, which brings potential risks of backdoor attacks. A backdoored model always predicts a target class in the presence of a predefined trigger pattern, which can be easily realized via poisoning a small amount of data. In general, adversarial training is believed to defend against backdoor attacks since it helps models to keep their prediction unchanged even if we perturb the input image (as long as within a feasible range). Unfortunately, few previous studies succeed in doing so. To explore whether adversarial training could defend against backdoor attacks or not, we conduct extensive experiments across different threat models and perturbation budgets, and find the threat model in adversarial training matters. For instance, adversarial training with spatial adversarial examples provides notable robustness against commonly-used patch-based backdoor attacks. We further propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
翻译:DNNs对大量数据的需求迫使执业者在不仔细检查的情况下从互联网上收集数据,而没有仔细检查,因为成本令人无法接受,这会带来幕后攻击的潜在风险。一个幕后模式总是在预先确定的触发模式面前预测目标类别,这种模式通过毒害少量数据很容易实现。一般而言,对抗性培训被认为可以防范幕后攻击,因为它有助于模式保持其预测不变,即使我们干扰输入图像(只要在可行的范围内),但不幸的是,以前的研究很少能成功这样做。为了探究对抗性培训能否抵御幕后攻击,我们在不同的威胁模式和扰动预算方面进行广泛的实验,并在对抗性训练事项中找到威胁模式。例如,带有空间对抗性实例的对抗性培训为通常使用的隔间幕后攻击提供了明显的强力。我们进一步提议了一个混合战略,为不同幕后攻击提供令人满意的强力。