This paper substantially extends our work published at ECCV, in which an intermediate-level attack was proposed to improve the transferability of some baseline adversarial examples. Specifically, we advocate a framework in which a direct linear mapping from the intermediate-level discrepancies (between adversarial features and benign features) to prediction loss of the adversarial example is established. By delving deep into the core components of such a framework, we show that 1) a variety of linear regression models can all be considered in order to establish the mapping, 2) the magnitude of the finally obtained intermediate-level adversarial discrepancy is correlated with the transferability, 3) further boost of the performance can be achieved by performing multiple runs of the baseline attack with random initialization. In addition, by leveraging these findings, we achieve new state-of-the-arts on transfer-based $\ell_\infty$ and $\ell_2$ attacks. Our code is publicly available at https://github.com/qizhangli/ila-plus-plus-lr.
翻译:本文大大扩展了我们在ECCV上出版的工作,在该文件中,提议进行中级攻击,以提高一些基线对抗性实例的可转移性。具体地说,我们主张建立一个框架,从中间一级差异(对抗性特征和良性特征之间)建立直线绘图,以预测对抗性实例的损失。我们深入这一框架的核心组成部分,表明1)可以考虑各种线性回归模型,以建立绘图;2最后获得的中间一级对抗性差异的规模与可转移性相关;3)通过随机初始化进行多次基线攻击,可以进一步提升性能;此外,通过利用这些发现,我们实现了关于基于$@ell_infty$和$\ell_2$的攻击的新状态。我们的代码可在https://github.com/qizhangli/ila-plus-plus-lr上公开查阅。