The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective in four different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial retraining, we propose to introduce engineered non-convex constraints to improve model adversarial robustness. We demonstrate that this new defense is as effective as adversarial retraining. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that future research can exploit.
翻译:生成可行的对抗性实例对于正确评估在受限制的特性空间发挥作用的模型是必要的。然而,将限制强加到为计算机视觉设计的攻击中,仍然是一项具有挑战性的任务。我们提出了一个统一的框架,以产生符合特定领域限制的可行的对抗性实例。我们的框架可以同时处理线性和非线性制约因素。我们将我们的框架立即变成两种算法:一种基于梯度的攻击,对损失功能施加限制,以最大化为目的,一种旨在错误分类、尽量减少扰动和限制满意度的多目标搜索算法。我们表明,我们的方法在四个不同领域是有效的,成功率高达100%,最先进的攻击未能产生一个单一可行的例子。除了对抗性再培训外,我们提议采用设计非共性限制来改进对抗性强力模型。我们证明,这种新的防御手段与对抗性再培训一样有效。我们的框架是研究受限制的对抗性攻击的起点,并提供未来研究可以利用的相关基线和数据集。