Since the discovery of adversarial attacks against machine learning models nearly a decade ago, research on adversarial machine learning has rapidly evolved into an eternal war between defenders, who seek to increase the robustness of ML models against adversarial attacks, and adversaries, who seek to develop better attacks capable of weakening or defeating these defenses. This domain, however, has found little buy-in from ML practitioners, who are neither overtly concerned about these attacks affecting their systems in the real world nor are willing to trade off the accuracy of their models in pursuit of robustness against these attacks. In this paper, we motivate the design and implementation of Ares, an evaluation framework for adversarial ML that allows researchers to explore attacks and defenses in a realistic wargame-like environment. Ares frames the conflict between the attacker and defender as two agents in a reinforcement learning environment with opposing objectives. This allows the introduction of system-level evaluation metrics such as time to failure and evaluation of complex strategies such as moving target defenses. We provide the results of our initial exploration involving a white-box attacker against an adversarially trained defender.
翻译:自近十年前发现针对机器学习模式的对抗性攻击以来,关于对抗性机器学习的研究迅速演变为维权者之间的一场永久战争,维权者力求提高对抗性攻击的ML模式的稳健性,而对手则寻求发展能够削弱或击败这些防御的更好的攻击。然而,这个领域从ML实践者那里得到的很少支持,他们既未公开关注这些攻击在现实世界中影响到其系统的攻击,也未愿意交换其模型的准确性,以对付这些攻击。在本文中,我们鼓励设计和实施Ares,这是对对抗性ML的评估框架,使研究人员能够在现实的战争式环境中探索攻击和防御。将攻击者与捍卫者之间的冲突描述为强化学习环境中具有相反目标的两个推动者。这使得能够采用系统一级的评价指标,例如时间来失败,评价复杂的战略,例如移动目标防御。我们提供了我们初步探索的结果,涉及白箱攻击一名经过敌对性训练的防御者。