Machine learning is vulnerable to adversarial examples-inputs designed to cause models to perform poorly. However, it is unclear if adversarial examples represent realistic inputs in the modeled domains. Diverse domains such as networks and phishing have domain constraints-complex relationships between features that an adversary must satisfy for an attack to be realized (in addition to any adversary-specific goals). In this paper, we explore how domain constraints limit adversarial capabilities and how adversaries can adapt their strategies to create realistic (constraint-compliant) examples. In this, we develop techniques to learn domain constraints from data, and show how the learned constraints can be integrated into the adversarial crafting process. We evaluate the efficacy of our approach in network intrusion and phishing datasets and find: (1) up to 82% of adversarial examples produced by state-of-the-art crafting algorithms violate domain constraints, (2) domain constraints are robust to adversarial examples; enforcing constraints yields an increase in model accuracy by up to 34%. We observe not only that adversaries must alter inputs to satisfy domain constraints, but that these constraints make the generation of valid adversarial examples far more challenging.
翻译:在本文中,我们探讨了域限制如何限制对抗能力,对手如何调整其战略以创造现实(约束性)实例。在此过程中,我们开发了从数据中学习域限制的技术,并展示了如何将所学到的限制因素纳入对抗性设计过程。我们评估了我们在网络侵入和phishing数据集方面的做法的有效性,并发现:(1) 由最新手法算法产生的对抗性例子中高达82%违反域限制,(2) 区域限制对对抗性例子来说是有力的;实施限制使模型准确性提高高达34%。我们发现,不仅敌对者必须改变投入以满足域限制,而且这些限制使得产生有效的对抗性例子更具挑战性。