最佳运输作为防范反向攻击的防御手段 (Optimal Transport as a Defense Against Adversarial Attacks)

Deep learning classifiers are now known to have flaws in the representations of their class. Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model. The most effective methods to defend against such attacks trains on generated adversarial examples to learn their distribution. Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness. Yet, they partially align the representations using approaches that do not reflect the geometry of space and distribution. In addition, it is difficult to accurately compare robustness between defended models. Until now, they have been evaluated using a fixed perturbation size. However, defended models may react differently to variations of this perturbation size. In this paper, the analogy of domain adaptation is taken a step further by exploiting optimal transport theory. We propose to use a loss between distributions that faithfully reflect the ground distance. This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks. Then, we propose to quantify more precisely the robustness of a model to adversarial attacks over a wide range of perturbation sizes using a different metric, the Area Under the Accuracy Curve (AUAC). We perform extensive experiments on both CIFAR-10 and CIFAR-100 datasets and show that our defense is globally more robust than the state-of-the-art.

翻译：深层次的学习分类者现在知道他们的课堂表达方式有缺陷。反向攻击可以发现对特定图像的人类难以察觉的扰动会误导一个经过训练的模式。保护这些攻击的最有效的方法是用生成的对立实例来学习其分布。先前的工作旨在将原始和对立图像表达方式与领域调整相同,以提高强度。但是,它们部分地使用不反映空间和分布几何的方法来调整这些表达方式。此外, 很难准确地比较被辩护模型之间的稳健性。直到现在, 已经用固定的扰动尺寸来评估这些模型。但是, 被辩护模型可能对这种扰动大小的变化做出不同的反应。在本文中, 域适应的类比更进一步, 利用最佳的运输理论。我们提议使用忠实反映地面距离的分布之间的损失。这导致SAT (Sinkhorn Adversarial Adversarial train), 一种更强有力的防御方法。然后, 我们提议用更精确的量化对防御性攻击模型的稳健性, 超过100级的防御规模。我们使用不同的AR- 10级的模型, 进行一个不同的区域的比较的模型, 显示一个不同的国际化, 我们的CRA 的BRA 。