Intentionally crafted adversarial samples have effectively exploited weaknesses in deep neural networks. A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample such that its corresponding model output changes. These sensitivity attacks exploit the model's sensitivity toward task-irrelevant features. Another form of adversarial sample can be crafted via invariance attacks, which exploit the model underestimating the importance of relevant features. Previous literature has indicated a tradeoff in defending against both attack types within a strictly L_p bounded defense. To promote robustness toward both types of attacks beyond Euclidean distance metrics, we use metric learning to frame adversarial regularization as an optimal transport problem. Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
翻译:故意设计的对抗性样本有效地利用了深层神经网络的弱点。对抗性强力的标准方法假设了一种防御样本的框架,这种框架由最小扰动的样本制成,其相应的模型输出变化如此。这些敏感度攻击利用了模型对任务相关特征的敏感度。另一种形式的对抗性样本可以通过变化式攻击来制作,这种攻击利用低估相关特征重要性的模式。以前的文献表明,在严格L_p约束的防御范围内,对两种类型的攻击进行防御是权衡的。为了促进两种类型的攻击的稳健性,我们利用衡量学习来将对抗性规范作为最佳运输问题。我们的初步结果显示,在我们的框架内,对变化性扰动的常规化改进了差异性和灵敏性防御。