利用Frank-Wolfe方法制造结构性对立攻击 (Generating Structured Adversarial Attacks Using Frank-Wolfe Method)

White box adversarial perturbations are generated via iterative optimization algorithms most often by minimizing an adversarial loss on a $\ell_p$ neighborhood of the original image, the so-called distortion set. Constraining the adversarial search with different norms results in disparately structured adversarial examples. Here we explore several distortion sets with structure-enhancing algorithms. These new structures for adversarial examples might provide challenges for provable and empirical robust mechanisms. Because adversarial robustness is still an empirical field, defense mechanisms should also reasonably be evaluated against differently structured attacks. Besides, these structured adversarial perturbations may allow for larger distortions size than their $\ell_p$ counter-part while remaining imperceptible or perceptible as natural distortions of the image. We will demonstrate in this work that the proposed structured adversarial examples can significantly bring down the classification accuracy of adversarialy trained classifiers while showing low $\ell_2$ distortion rate. For instance, on ImagNet dataset the structured attacks drop the accuracy of adversarial model to near zero with only 50\% of $\ell_2$ distortion generated using white-box attacks like PGD. As a byproduct, our finding on structured adversarial examples can be used for adversarial regularization of models to make models more robust or improve their generalization performance on datasets which are structurally different.

翻译：白箱对抗性扰动最常通过迭代优化算法产生,其方式是尽量减少最初图像(所谓的扭曲套件)附近$@ell_p$美元周围的对抗性损失。用不同的规范对敌对性搜索进行约束,其结果是结构化的对抗性实例。我们在这里探讨一些带有结构强化算法的扭曲组合。这些对抗性实例的新结构可能为可辨和实证性强强机制带来挑战。由于对抗性强力仍然是一个经验性领域,防御性机制也应合理地被评估为不同结构攻击。此外,这些结构化的对抗性侵入可能使得扭曲规模大于其美元/ell_p$的反面值,而以不同的规范形式对立性搜索则仍然不易察觉或可察觉为图像的自然扭曲。我们将在这项工作中表明,拟议的结构性对抗性辩论性实例可以大大降低经过训练的对立性分析师的分类准确性,同时显示低值 $/ell_2美元的扭曲率。例如,在ImagNet 数据库中,对结构性攻击的精确性攻击将降低到接近于零,只有$_ell_2美元的扭曲值,而用白箱式攻击产生的对立式攻击则能性攻击是用于结构化的对立性攻击的对立性变模型。