There is broad consensus among researchers studying adversarial examples that it is extremely difficult to achieve transferable targeted attacks. Currently, existing research strives for transferable targeted attacks by resorting to complex losses and even massive training. In this paper, we take a second look at transferable targeted attacks and show that their difficulty has been overestimated due to a blind spot in the conventional evaluation procedures. Specifically, current work has unreasonably restricted attack optimization to a few iterations. Here, we show that targeted attacks converge slowly to optimal transferability and improve considerably when given more iterations. We also demonstrate that an attack that simply maximizes the target logit performs surprisingly well, remarkably surpassing more complex losses and even achieving performance comparable to the state of the art, which requires massive training with a sophisticated multi-term loss. We provide further validation of our logit attack in a realistic ensemble setting and in a real-world attack against the Google Cloud Vision API. The logit attack produces perturbations that reflect the target semantics, which we demonstrate allows us to create targeted universal adversarial perturbations without additional training images.
翻译:研究对抗性攻击的研究人员普遍认为极难实现可转移的定点攻击。目前,现有研究通过复杂的损失甚至大规模培训,努力寻找可转移的定点攻击。在本论文中,我们第二次审视可转移的定点攻击,并表明由于常规评价程序中的盲点,它们的困难被高估了。具体地说,目前的工作不合理地将攻击的优化限制在几个迭代中。在这里,我们表明,定点攻击缓慢地归结为最佳的可转移性,如果增加迭代,则会大大改进。我们还表明,只使目标日志最大化的攻击效果惊人,大大超过更复杂的损失,甚至达到与艺术状态相当的性能,这需要大量培训,并造成复杂的多期损失。我们进一步证明,我们的日志攻击是现实的,也是在现实世界对谷歌云愿景API的攻击中进行的。logit攻击产生了反映目标语义的扰动性,我们证明我们能够在没有额外培训图像的情况下建立有针对性的普遍对立点攻击。