It has been observed that Deep Neural Networks (DNNs) are vulnerable to transfer attacks in the query-free black-box setting. However, all the previous studies on transfer attack assume that the white-box surrogate models possessed by the attacker and the black-box victim models are trained on the same dataset, which means the attacker implicitly knows the label set and the input size of the victim model. However, this assumption is usually unrealistic as the attacker may not know the dataset used by the victim model, and further, the attacker needs to attack any randomly encountered images that may not come from the same dataset. Therefore, in this paper we define a new Generalized Transferable Attack (GTA) problem where we assume the attacker has a set of surrogate models trained on different datasets (with different label sets and image sizes), and none of them is equal to the dataset used by the victim model. We then propose a novel method called Image Classification Eraser (ICE) to erase classification information for any encountered images from arbitrary dataset. Extensive experiments on Cifar-10, Cifar-100, and TieredImageNet demonstrate the effectiveness of the proposed ICE on the GTA problem. Furthermore, we show that existing transfer attack methods can be modified to tackle the GTA problem, but with significantly worse performance compared with ICE.
翻译:据观察,深神经网络(DNNS)很容易在无查询的黑盒设置中转移攻击,但是,以往关于转移攻击的所有研究都假定攻击者和黑盒子受害者模型拥有的白色盒子替代模型是在同一数据集上培训的,这意味着攻击者隐含地知道标签组和受害者模型的输入大小。然而,这一假设通常不切实际,因为攻击者可能不知道受害者模型使用的数据集,而且攻击者需要随机地攻击任何可能并非来自同一数据集的图像。因此,在本文中,我们定义了一个新的通用可转移攻击模型(GTA)问题,我们假设攻击者和黑盒子受害者模型拥有一套在不同的数据集(不同标签组和图像大小)上受过训练的替代模型,而这些模型都与受害者模型使用的数据集无异。我们然后提出一种叫作图像分类Eraser(ICE)的新方法,以抹除任意数据集中遇到的任何图像的分类信息。在Cifar-10、Cifar-100上进行广泛的实验,我们定义了一个新的通用可转让攻击(GGTA)问题。我们用GREM-TA系统展示了目前变型的操作方法。