Deep neural networks (DNNs) have achieved remarkable performance across a wide range of applications, while they are vulnerable to adversarial examples, which motivates the evaluation and benchmark of model robustness. However, current evaluations usually use simple metrics to study the performance of defenses, which are far from understanding the limitation and weaknesses of these defense methods. Thus, most proposed defenses are quickly shown to be attacked successfully, which results in the ``arm race'' phenomenon between attack and defense. To mitigate this problem, we establish a model robustness evaluation framework containing 23 comprehensive and rigorous metrics, which consider two key perspectives of adversarial learning (i.e., data and model). Through neuron coverage and data imperceptibility, we use data-oriented metrics to measure the integrity of test examples; by delving into model structure and behavior, we exploit model-oriented metrics to further evaluate robustness in the adversarial setting. To fully demonstrate the effectiveness of our framework, we conduct large-scale experiments on multiple datasets including CIFAR-10, SVHN, and ImageNet using different models and defenses with our open-source platform. Overall, our paper provides a comprehensive evaluation framework, where researchers could conduct comprehensive and fast evaluations using the open-source toolkit, and the analytical results could inspire deeper understanding and further improvement to the model robustness.
翻译:深心神经网络(DNNs)在广泛的应用中取得了显著的成绩,尽管它们容易受到引发对模型稳健性进行评估和衡量模型强健性基准的对抗性实例的影响,但目前的评价通常使用简单的衡量标准来研究防御性的表现,这些衡量标准远不能理解这些防御方法的局限性和弱点。因此,大多数拟议的防御措施很快被证明是成功的,从而导致攻击和防御之间的“军备竞赛”现象。为了缓解这一问题,我们建立了一个包含23个全面而严格的衡量标准的稳健性示范评价框架,其中考虑到对抗性学习的两个关键观点(即数据和模型)。通过神经覆盖和数据不易读性,我们使用面向数据的指标来衡量防御性,以衡量测试实例的完整性;通过研究模型的结构和行为,我们利用面向模型的衡量标准来进一步评价对抗性环境的稳健健性。为了充分证明我们的框架的有效性,我们用不同的模型和防御性网络,包括CIFAR-10、SVHN和图像网络,以及利用不同的模型和防御性研究的两种观点(即数据和模型),我们的文件可以提供一个快速的更深入的分析结果分析框架。