This work proposes a novel perspective on adversarial attacks by introducing the concept of sample attackability and robustness. Adversarial attacks insert small, imperceptible perturbations to the input that cause large, undesired changes to the output of deep learning models. Despite extensive research on generating adversarial attacks and building defense systems, there has been limited research on understanding adversarial attacks from an input-data perspective. We propose a deep-learning-based method for detecting the most attackable and robust samples in an unseen dataset for an unseen target model. The proposed method is based on a neural network architecture that takes as input a sample and outputs a measure of attackability or robustness. The proposed method is evaluated using a range of different models and different attack methods, and the results demonstrate its effectiveness in detecting the samples that are most likely to be affected by adversarial attacks. Understanding sample attackability can have important implications for future work in sample-selection tasks. For example in active learning, the acquisition function can be designed to select the most attackable samples, or in adversarial training, only the most attackable samples are selected for augmentation.
翻译:这项工作提出了关于对抗性攻击的新观点,引入了可攻击性和稳健性概念。反向攻击对造成大量、不理想的深层次学习模式产出变化的投入插入了小的、无法察觉的扰动。尽管对产生对抗性攻击和建立防御系统进行了广泛的研究,但对从输入数据角度理解对抗性攻击的研究有限。我们提出了一个深层次的基于学习的方法,用于在看不见目标模型的无形数据集中探测最可攻击性和最强的样本。拟议方法基于神经网络结构,作为输入一个样本和输出的可攻击性或稳健性尺度。拟议方法使用一系列不同的模型和不同攻击方法进行评估,其结果显示其在探测最有可能受到对抗性攻击性攻击的样品方面的有效性。从输入数据角度理解可攻击性攻击性对今后在抽样选择任务中的工作具有重要影响。例如积极学习,获取功能可以设计为选择最可攻击性样品,或者在对抗性训练中,只有最容易攻击的样品被选择用于增强性。