The security of Deep Learning classifiers is a critical field of study because of the existence of adversarial attacks. Such attacks usually rely on the principle of transferability, where an adversarial example crafted on a surrogate classifier tends to mislead the target classifier trained on the same dataset even if both classifiers have quite different architecture. Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers in an ensemble having diverse decision boundaries. However, recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. This paper attempts to develop a new ensemble methodology that constructs multiple diverse classifiers using a Pairwise Adversarially Robust Loss (PARL) function during the training procedure. PARL utilizes gradients of each layer with respect to input in every classifier within the ensemble simultaneously. The proposed training procedure enables PARL to achieve higher robustness against black-box transfer attacks compared to previous ensemble methods without adversely affecting the accuracy of clean examples. We also evaluate the robustness in the presence of white-box attacks, where adversarial examples are crafted using parameters of the target classifier. We present extensive experiments using standard image classification datasets like CIFAR-10 and CIFAR-100 trained using standard ResNet20 classifier against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble methodology.
翻译:由于存在对抗性攻击,深海学习分类器的安全是一个关键的研究领域。这种攻击通常依赖可转移性原则,在代代分类器上制作的对抗性例子往往误导在同一数据集上受过训练的目标分类器,即使这两个分类器的结构相当不同。对付敌对性攻击的集合方法表明,对抗性攻击的对抗性例子不太可能同时在具有不同决定界限的组合中误导多个分类器。然而,最近的共同方法要么被证明易受较强的对手的伤害,要么被显示缺乏端对端评价。本文试图开发一种新的混合方法,在培训过程中利用“双向反向机械失损”(PARL)功能构建多种不同的分类器。PARL利用每一层的梯度来同时误导多个分类器的投入。拟议的培训程序使PARL能够较强地对付黑箱转移攻击,而与以前的混合方法相比,但不会对清洁的准确性评价。本文试图开发一种新的混合混合方法,在培训后,我们使用经过训练的战前国际武装部队标准级攻击的精确性侵略性攻击指标,我们使用经过训练的机级标准性攻击的研订的研订的精确性准则。