Existing transfer attack methods commonly assume that the attacker knows the training set (e.g., the label set, the input size) of the black-box victim models, which is usually unrealistic because in some cases the attacker cannot know this information. In this paper, we define a Generalized Transferable Attack (GTA) problem where the attacker doesn't know this information and is acquired to attack any randomly encountered images that may come from unknown datasets. To solve the GTA problem, we propose a novel Image Classification Eraser (ICE) that trains a particular attacker to erase classification information of any images from arbitrary datasets. Experiments on several datasets demonstrate that ICE greatly outperforms existing transfer attacks on GTA, and show that ICE uses similar texture-like noises to perturb different images from different datasets. Moreover, fast fourier transformation analysis indicates that the main components in each ICE noise are three sine waves for the R, G, and B image channels. Inspired by this interesting finding, we then design a novel Sine Attack (SA) method to optimize the three sine waves. Experiments show that SA performs comparably to ICE, indicating that the three sine waves are effective and enough to break DNNs under the GTA setting.
翻译:现有的传输攻击方法通常假定攻击者了解黑盒受害者模型的培训( 如标签集、输入大小), 通常不切实际, 因为在某些情况下攻击者无法了解这些信息。 在本文中, 我们定义了一个通用的可传输攻击( GTA) 问题, 攻击者不知道这些信息, 并获得来攻击来自未知数据集的任何随机图像。 为了解决 GTA 问题, 我们提议了一个新的图像分类 Eraser (ICE), 用于培训特定攻击者, 以删除任意数据集中任何图像的分类信息。 几个数据集的实验显示, ICE 大大地超过了 GTA 上现有的传输攻击, 并显示 ICE 使用相似的质象噪音来包围不同数据集中的不同图像。 此外, 快速的四倍变分析显示, 每个ICE 噪声中的主要部件是 R、 G 和 B 图像频道的三正弦波 。 我们随后设计了一个新的 Sine 攻击(SA) 方法, 来优化三正弦的 GSINE 。