Model inversion attacks (MIAs) aim to create synthetic images that reflect the class-wise characteristics from a target classifier's private training data by exploiting the model's learned knowledge. Previous research has developed generative MIAs that use generative adversarial networks (GANs) as image priors tailored to a specific target model. This makes the attacks time- and resource-consuming, inflexible, and susceptible to distributional shifts between datasets. To overcome these drawbacks, we present Plug & Play Attacks, which relax the dependency between the target model and image prior, and enable the use of a single GAN to attack a wide range of targets, requiring only minor adjustments to the attack. Moreover, we show that powerful MIAs are possible even with publicly available pre-trained GANs and under strong distributional shifts, for which previous approaches fail to produce meaningful results. Our extensive evaluation confirms the improved robustness and flexibility of Plug & Play Attacks and their ability to create high-quality images revealing sensitive class characteristics.
翻译:模型反向攻击(MIAs)的目的是利用目标分类者的私人培训数据所学的知识,从该模型的私人培训数据中创造反映阶级特征的合成图像。以前的研究已经开发出基因化模型,将基因对抗网络(GANs)用作适合特定目标模型的图像前置。这使得袭击耗时耗时和资源,不灵活,容易在数据集之间发生分布变化。为了克服这些缺陷,我们提出了插件和播放攻击,这放松了目标模型与之前图像之间的依赖性,并使单一的GAN能够攻击范围广泛的目标,只需要对攻击稍作调整。此外,我们表明,即使经过公开培训的GANs能够使用,而且处于强大的分布变化之下,以前的办法也无法产生有意义的结果。我们的广泛评价证实,Plug & Play攻击的坚固性和灵活性得到了提高,而且能够产生高品质的显示敏感类别特征的图像。