Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image classification networks. In its most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l-inf perturbations limited to eps = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite its general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l-inf, eps = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers. We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low-resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient-based attacks appear to become even more detectable with increasing resolutions.
翻译:最近,RobustBench(Croce等人,2020年)已成为人们广泛承认的图像分类网络对抗性稳健性基准(RobustBench 等人,2020年),成为人们广泛承认的图像分类网络对抗性强度基准。在最经常报告的子任务中,RobustBench在AutoAttack (Croce and Hein 2020b) 下,评估和排名CFAR10 上受过训练的神经网络的对抗性强度,AutoAttack 的干扰仅限于 eps = 8/255。目前最佳的模型在基准值约为60%左右的模型中,将这一基准描述为相当具有挑战性。尽管其在最近的文献中被普遍接受,但我们的目标是促进讨论RobustBench 是否适合作为稳健性的关键指标,该指标可以推广到实际应用。我们对此的论证是双重的,并得到本文中阐述的过度实验的支持:我们认为,AutAttack的数据与 linf 的转换是不现实的,eps = 8/255,因此相当强烈,因此,即使通过简单的检测算算算算算和人类的精确的图像也更接近接近于较难的测算。