Model extraction attacks aim to duplicate a machine learning model through query access to a target model. Early studies mainly focus on discriminative models. Despite the success, model extraction attacks against generative models are less well explored. In this paper, we systematically study the feasibility of model extraction attacks against generative adversarial networks (GANs). Specifically, we first define accuracy and fidelity on model extraction attacks against GANs. Then we study model extraction attacks against GANs from the perspective of accuracy extraction and fidelity extraction, according to the adversary's goals and background knowledge. We further conduct a case study where an adversary can transfer knowledge of the extracted model which steals a state-of-the-art GAN trained with more than 3 million images to new domains to broaden the scope of applications of model extraction attacks. Finally, we propose effective defense techniques to safeguard GANs, considering a trade-off between the utility and security of GAN models.
翻译:早期研究主要侧重于歧视模式。尽管取得了成功,但对基因模型的模型抽取攻击没有很好地探索。在本文中,我们系统地研究对基因对抗网络的模型抽取攻击的可行性。具体地说,我们首先确定对基因对抗网络的模型抽取攻击的准确性和忠诚性。然后,我们从精确提取和忠诚提取的角度,根据对手的目标和背景知识,研究对基因网络的模型抽取攻击。我们进一步进行一项案例研究,让对手能够将关于提取模型的知识传授给新领域,这种模型将经过300多万图象培训的先进GAN转移到新领域,以扩大模型抽取攻击的应用范围。最后,我们提出有效的防御技术来保护基因网络,同时考虑在GAN模型的实用和安全之间作出权衡。