Neural network classifiers are vulnerable to misclassification of adversarial samples, for which the current best defense trains classifiers with adversarial samples. However, adversarial samples are not optimal for steering attack convergence, based on the minimization at the core of adversarial attacks. The minimization perturbation term can be minimized towards $0$ by replacing adversarial samples in training with duplicated original samples, labeled differently only for training. Using only original samples, Target Training eliminates the need to generate adversarial samples for training against all attacks that minimize perturbation. In low-capacity classifiers and without using adversarial samples, Target Training exceeds both default CIFAR10 accuracy ($84.3$%) and current best defense accuracy (below $25$%) with $84.8$% against CW-L$_2$($\kappa=0$) attack, and $86.6$% against DeepFool. Using adversarial samples against attacks that do not minimize perturbation, Target Training exceeds current best defense ($69.1$%) with $76.4$% against CW-L$_2$($\kappa=40$) in CIFAR10.
翻译:神经网络分类员容易对对抗性样品进行错误分类,而目前最好的防御性分类员则用对抗性样品来训练对抗性样品。但是,在尽量减少对抗性攻击的核心时,对抗性样品对于引导攻击趋同并非最佳选择。在训练时,用重复的原始样品替换对抗性样品,尽量减少扰动期限可减至0美元,只对培训作不同的标记。只使用原始样品,目标培训就消除了产生对抗性样品的必要性,以训练所有攻击,尽量减少扰动。在低能力分类员和不使用对抗性样品的情况下,目标培训超过了默认的CIFAR10精确度(8.3%)和当前最佳防御性精确度(低于25美元),即84.8%对CW-L$(Kapa=0美元)的攻击,86.6%对DeepFool的攻击。使用对抗性攻击的对抗性样品,不会尽量减少扰动性攻击,目标培训超过目前的最佳防御性(69.1%),在15.4%对CW-L$_10美元(\kapa=40美元)的CIAFAR中,76.4%对C-L$(0.10美元)。