强力 beust bench:一个标准化的对抗对敌稳健性基准 (RobustBench: a standardized adversarial robustness benchmark)

from arxiv, Version 2: 90+ evaluations, 60+ models, 5 leaderboards (Linf, L2, common corruptions), significantly expanded analysis part (calibration, fairness, privacy leakage, smoothness, transferability)

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness, which often makes it hard to identify the most promising ideas in training robust models. A key challenge in benchmarking robustness is that its evaluation is often error-prone, leading to overestimation of the true robustness of models. While adaptive attacks designed for a particular defense are a potential solution, they have to be highly customized for particular models, which makes it difficult to compare different methods. Our goal is to instead establish a standardized benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. To evaluate the robustness of models for our benchmark, we consider AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. We also impose some restrictions on the admitted models to rule out defenses that only make gradient-based attacks ineffective without improving actual robustness. Our leaderboard, hosted at https://robustbench.github.io/, contains evaluations of 90+ models and aims at reflecting the current state of the art on a set of well-defined tasks in $\ell_\infty$- and $\ell_2$-threat models and on common corruptions, with possible extensions in the future. Additionally, we open-source the library https://github.com/RobustBench/robustbench that provides unified access to 60+ robust models to facilitate their downstream applications. Finally, based on the collected models, we analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.

翻译：作为研究界,我们仍缺乏对对抗性稳健性进展的系统理解,这往往使得很难在培训稳健模型方面找到最有希望的想法。基准稳健性方面的一个关键挑战是,其评价往往容易出错,导致高估模型的真正稳健性。虽然为特定防御设计的适应性攻击是一种潜在的解决方案,但必须高度定制特定模型,从而难以比较不同的方法。我们的目标是建立一个标准标准基准,确定对抗性稳健性,这尽可能准确地反映所考虑的模式在合理的计算预算范围内的稳健性。要评估我们基准的模型的稳健性,我们考虑AutoAttack,这是最近一项大规模研究显示的白黑箱攻击,目的是改进几乎所有的稳健性评价。我们还对所接受的模型施加了一些限制,以排除防御性只能使基于梯度的攻击无效,而不能提高实际稳健性。我们的领导板,在 https://robettbench.github.iob/sloadliveralalalalalality 上, exal demodealalalal demotionalalalationslations, ex missalalalal exalalalalalal exalalalalalalaltiums。我们在90_lationslationslationslationslationslationslations exlations。我们提供了在90 exaldal_lational_lationslationslationslationsmalationslational_lationslationslationslationslationalmentalmentalmental ex exal exal_al_al_al_sal_saldaldaldaldaldaldaldaldaldal_saldaldaldaldaldaldaldaldal exaldaldaldaldaldaldaldaldaldaldaldaldalalalals上,在90 exalaldalddal exalalalal exaldaldalss上,在90 exal_s上,在90 exalal_salsalsalsalalalalalalal