The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.
翻译:在对抗对抗性攻击方面,现有的大量研究侧重于抵御单一(通常受约束的Lp-norm)攻击,但对于实际环境而言,机器学习模型应该能够抵御各种各样的攻击。在本文中,我们提出了审议多起攻击ML模型的首个统一框架。我们的框架能够模拟不同程度的学习者对试验-对抗性攻击的知识,使我们能够模拟对意外攻击的稳健性和对攻击联盟的稳健性。我们利用我们的框架,提出了第一个领导板,即多RobustBench,以衡量多重攻击评价的基准,该评价能够捕捉到攻击类型和攻击力量之间的性能。我们评估了16个防御性模型的性能,以对付9种不同攻击类型,包括以Lp为基础的威胁模型、空间变化和颜色变化,以20种不同的攻击力(总共180次攻击)。此外,我们分析了当前对多次攻击的防御状态。我们的分析表明,尽管现有的防御在所使用的攻击组合的平均稳健性方面取得了进展,但对付最坏的攻击的强性仍然是所有现有模型都比随机性更差的更大问题。