Adversarial attacks on a convolutional neural network (CNN) -- injecting human-imperceptible perturbations into an input image -- could fool a high-performance CNN into making incorrect predictions. The success of adversarial attacks raises serious concerns about the robustness of CNNs, and prevents them from being used in safety-critical applications, such as medical diagnosis and autonomous driving. Our work introduces a visual analytics approach to understanding adversarial attacks by answering two questions: (1) which neurons are more vulnerable to attacks and (2) which image features do these vulnerable neurons capture during the prediction? For the first question, we introduce multiple perturbation-based measures to break down the attacking magnitude into individual CNN neurons and rank the neurons by their vulnerability levels. For the second, we identify image features (e.g., cat ears) that highly stimulate a user-selected neuron to augment and validate the neuron's responsibility. Furthermore, we support an interactive exploration of a large number of neurons by aiding with hierarchical clustering based on the neurons' roles in the prediction. To this end, a visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks. We validate the effectiveness of our system through multiple case studies as well as feedback from domain experts.
翻译:对进化神经网络(CNN)的反动攻击 -- -- 将人类无法察觉的干扰注入输入图像 -- -- 可能愚弄高性能的CNN进行不正确的预测。对抗性攻击的成功使人对CNN的稳健性产生严重关切,并阻止它们被用于安全关键应用,如医学诊断和自主驾驶等。我们的工作提出了一种视觉分析方法,以通过回答两个问题来理解对抗性攻击:(1)哪些神经元更容易受到攻击,(2)在预测期间这些脆弱的神经元捕捉到什么图像特征?首先,我们引入多重扰动性措施,将攻击的强度打破CNN神经元,并按其脆弱性程度对神经元进行排名。第二,我们发现一些图像特征(例如猫耳),这些特征高度刺激用户选择的神经元增强和验证神经元的责任。此外,我们支持通过基于神经元在预测中的作用的等级组合,对大量神经元进行互动式探索。至此,我们为此采用了基于视觉反射系统设计的一种视觉反射力系统,将多重反射性系统纳入视觉推理学。</s>