We propose two training techniques for improving the robustness of Neural Networks to adversarial attacks, i.e. manipulations of the inputs that are maliciously crafted to fool networks into incorrect predictions. Both methods are independent of the chosen attack and leverage random projections of the original inputs, with the purpose of exploiting both dimensionality reduction and some characteristic geometrical properties of adversarial perturbations. The first technique is called RP-Ensemble and consists of an ensemble of networks trained on multiple projected versions of the original inputs. The second one, named RP-Regularizer, adds instead a regularization term to the training objective.
翻译:我们建议采用两种培训技术,提高神经网络对对抗性攻击的稳健性,即操纵恶意制造的、使网络误入预测的输入,两种方法都独立于选定的攻击,利用原始输入的随机预测,目的是利用维度减低和对抗性扰动的某些典型几何特性,第一种技术称为RP组合,由经过多种原始输入预测版本培训的网络组成。第二种方法称为RP-Regularizer,在培训目标中增加了一个正规化术语。