Model inversion attacks (MIAs) aim to create synthetic images that reflect the class-wise characteristics from a target classifier's training data by exploiting the model's learned knowledge. Previous research has developed generative MIAs using generative adversarial networks (GANs) as image priors that are tailored to a specific target model. This makes the attacks time- and resource-consuming, inflexible, and susceptible to distributional shifts between datasets. To overcome these drawbacks, we present Plug & Play Attacks that loosen the dependency between the target model and image prior and enable the use of a single trained GAN to attack a broad range of targets with only minor attack adjustments needed. Moreover, we show that powerful MIAs are possible even with publicly available pre-trained GANs and under strong distributional shifts, whereas previous approaches fail to produce meaningful results. Our extensive evaluation confirms the improved robustness and flexibility of Plug & Play Attacks and their ability to create high-quality images revealing sensitive class characteristics.
翻译:模型反转攻击(MIAs)的目的是利用模型所学的知识,从目标分类者的培训数据中创造反映类别特征的合成图像。以前的研究利用基因对抗网络(GANs)作为适合特定目标模型的图像前置,开发了基因化MIA(GANs),使攻击耗费时间和资源,不灵活,容易在数据集之间发生分布变化。为了克服这些缺陷,我们提出了插图和播放攻击,从而在目标模型和图像之前放松了依赖性,并使得能够使用单一的受过训练的GAN来攻击范围广泛的目标,只需稍作攻击性调整。此外,我们表明,即使经过事先培训的GANs可以公开使用,而且分布变化很激烈,强的MIA也是可能的,而以前的办法未能产生有意义的结果。我们的广泛评价证实了Plug & Play攻击的强大性和灵活性及其创造高品质显示敏感类别特征的图像的能力。