While many defences against adversarial examples have been proposed, finding robust machine learning models is still an open problem. The most compelling defence to date is adversarial training and consists of complementing the training data set with adversarial examples. Yet adversarial training severely impacts training time and depends on finding representative adversarial samples. In this paper we propose to train models on output spaces with large class separation in order to gain robustness without adversarial training. We introduce a method to partition the output space into class prototypes with large separation and train models to preserve it. Experimental results shows that models trained with these prototypes -- which we call deep repulsive prototypes -- gain robustness competitive with adversarial training, while also preserving more accuracy on natural samples. Moreover, the models are more resilient to large perturbation sizes. For example, we obtained over 50% robustness for CIFAR-10, with 92% accuracy on natural samples and over 20% robustness for CIFAR-100, with 71% accuracy on natural samples without adversarial training. For both data sets, the models preserved robustness against large perturbations better than adversarially trained models.
翻译:虽然提出了许多对抗性实例的抗辩,但找到强大的机器学习模式仍然是一个尚未解决的难题。迄今为止,最令人信服的抗辩是对抗性培训,包括以对抗性实例补充培训数据组。但对抗性培训严重影响了培训时间,并取决于寻找具有代表性的对抗性样本。在本文件中,我们提议在大型阶级分离的产出空间上培训模型,以获得强力,而无需对抗性培训。我们引入了一种方法,将产出空间分成等级原型,并使用大型分离和保存模型来保护它。实验结果显示,用这些原型培训的模型 -- -- 我们称之为深度反射原型 -- -- 获得了对抗性培训的强力竞争力,同时也保持了自然样本的准确性。此外,这些模型对大规模扰动规模的适应性更强。例如,我们为CIFAR-10获得了超过50%的稳健性,自然样本的精确度为92%,自然FAR-100型样本的强度超过20%,而没有对抗性培训的自然样本为71%。对于这两个数据集来说,模型都保持强性强度,防止大型扰动性强度高于敌对性模型。