Current machine learning models achieve super-human performance in many real-world applications. Still, they are susceptible against imperceptible adversarial perturbations. The most effective solution for this problem is adversarial training that trains the model with adversarially perturbed samples instead of original ones. Various methods have been developed over recent years to improve adversarial training such as data augmentation or modifying training attacks. In this work, we examine the same problem from a new data-centric perspective. For this purpose, we first demonstrate that the existing model-based methods can be equivalent to applying smaller perturbation or optimization weights to the hard training examples. By using this finding, we propose detecting and removing these hard samples directly from the training procedure rather than applying complicated algorithms to mitigate their effects. For detection, we use maximum softmax probability as an effective method in out-of-distribution detection since we can consider the hard samples as the out-of-distribution samples for the whole data distribution. Our results on SVHN and CIFAR-10 datasets show the effectiveness of this method in improving the adversarial training without adding too much computational cost.
翻译:目前机器学习模型在许多现实世界应用中达到超人性性能。 但是,它们仍然容易受到无法察觉的对抗性干扰。 这个问题最有效的解决办法是进行对抗性培训,用对抗性扰动样本而不是原始样本来训练模型。 近些年来,已经开发了各种方法来改进对抗性培训,例如数据增强或修改培训攻击。 在这项工作中,我们从新的数据中心角度来研究同样的问题。 为此,我们首先证明现有的模型方法可以等同于对硬性培训实例应用较小的扰动或优化重量。 通过这一发现,我们建议直接从培训程序中探测和删除这些硬性样本,而不是应用复杂的算法来减轻其影响。 为了检测,我们使用最大软性概率作为一种有效的分流检测方法,因为我们可以将硬性样本视为整个数据传播的分流样本。我们关于SVHN和CIFAR-10数据集的结果表明,这种方法在改进对抗性培训方面的效力并不增加太多计算成本。