Deep neural networks could be fooled by adversarial examples with trivial differences to original samples. To keep the difference imperceptible in human eyes, researchers bound the adversarial perturbations by the $\ell_\infty$ norm, which is now commonly served as the standard to align the strength of different attacks for a fair comparison. However, we propose that using the $\ell_\infty$ norm alone is not sufficient in measuring the attack strength, because even with a fixed $\ell_\infty$ distance, the $\ell_2$ distance also greatly affects the attack transferability between models. Through the discovery, we reach more in-depth understandings towards the attack mechanism, i.e., several existing methods attack black-box models better partly because they craft perturbations with 70\% to 130\% larger $\ell_2$ distances. Since larger perturbations naturally lead to better transferability, we thereby advocate that the strength of attacks should be simultaneously measured by both the $\ell_\infty$ and $\ell_2$ norm. Our proposal is firmly supported by extensive experiments on ImageNet dataset from 7 attacks, 4 white-box models, and 9 black-box models.
翻译:深神经网络可能会被与原始样本存在微小差异的对抗性例子所愚弄。 为了保持人类眼中无法察觉的差别, 研究人员将对抗性扰动按$\ell\ ⁇ infty$规范绑在一起, 现在通常使用美元=infty$规范作为标准来对不同攻击的强度进行匹配, 以便公平比较。 但是, 我们提议, 光用$\ell\ ⁇ infty$的规范不足以测量攻击强度, 因为即使固定的 $\ell\ ⁇ infty$距离, $_ ell_2$的距离也会大大影响模型之间的攻击可转移性。 通过发现, 我们能够更深入地了解攻击机制, 也就是说, 一些现有的方法攻击黑盒模型, 部分是因为它们制造了70 ⁇ - 至 130 更大的 $\ ell_2$的距离。 由于更大的扰动自然导致更好的可转移性, 我们因此主张, 攻击强度应该同时用 $\ ell\ infty$\ 和 $\\ ell_2$2$ 规范来测量攻击性。 我们的提议得到了来自7 攻击的黑模型的广泛实验的支持。