Adversarial attacks against deep neural networks (DNNs) are continuously evolving, requiring increasingly powerful defense strategies. We develop a novel adversarial defense framework inspired by the adaptive immune system: the Robust Adversarial Immune-inspired Learning System (RAILS). Initializing a population of exemplars that is balanced across classes, RAILS starts from a uniform label distribution that encourages diversity and debiases a potentially corrupted initial condition. RAILS implements an evolutionary optimization process to adjust the label distribution and achieve specificity towards ground truth. RAILS displays a tradeoff between robustness (diversity) and accuracy (specificity), providing a new immune-inspired perspective on adversarial learning. We empirically validate the benefits of RAILS through several adversarial image classification experiments on MNIST, SVHN, and CIFAR-10 datasets. For the PGD attack, RAILS is found to improve the robustness over existing methods by >= 5.62%, 12.5% and 10.32%, respectively, without appreciable loss of standard accuracy.
翻译:对深神经网络的反向攻击正在不断演变,需要越来越强大的防御战略。我们开发了由适应性免疫系统启发的新颖的对抗性防御框架:强力抗逆转录病毒激励学习系统(RAILS) 。初始化了各阶层平衡的外星体人口,RAILS从鼓励多样性的统一标签发放开始,并降低了最初可能腐败的状态。RAILS实施了一个进化优化程序,以调整标签分布,实现地面真相的特殊性。RAILS展示了强力(多样性)和准确性(具体性)之间的平衡,为对抗性学习提供了新的免疫性视角。我们通过对MDIST、SVHN和CIFAR-10数据集进行的若干对抗性图像分类实验,实证了RAIS的好处。对于PGT攻击,RAIRS发现其分别通过5.62%、12.5%和10.32 %的标准准确性,提高了现有方法的稳健性,但没有明显丧失标准准确性。