Machine learning models are typically made available to potential client users via inference APIs. Model extraction attacks occur when a malicious client uses information gleaned from queries to the inference API of a victim model $F_V$ to build a surrogate model $F_A$ that has comparable functionality. Recent research has shown successful model extraction attacks against image classification, and NLP models. In this paper, we show the first model extraction attack against real-world generative adversarial network (GAN) image translation models. We present a framework for conducting model extraction attacks against image translation models, and show that the adversary can successfully extract functional surrogate models. The adversary is not required to know $F_V$'s architecture or any other information about it beyond its intended image translation task, and queries $F_V$'s inference interface using data drawn from the same domain as the training data for $F_V$. We evaluate the effectiveness of our attacks using three different instances of two popular categories of image translation: (1) Selfie-to-Anime and (2) Monet-to-Photo (image style transfer), and (3) Super-Resolution (super resolution). Using standard performance metrics for GANs, we show that our attacks are effective in each of the three cases -- the differences between $F_V$ and $F_A$, compared to the target are in the following ranges: Selfie-to-Anime: FID $13.36-68.66$, Monet-to-Photo: FID $3.57-4.40$, and Super-Resolution: SSIM: $0.06-0.08$ and PSNR: $1.43-4.46$. Furthermore, we conducted a large scale (125 participants) user study on Selfie-to-Anime and Monet-to-Photo to show that human perception of the images produced by the victim and surrogate models can be considered equivalent, within an equivalence bound of Cohen's $d=0.3$.
翻译:机器学习模式通常通过推断 API 向潜在客户用户提供。 当恶意客户使用从询问到受害人模型的推断 ALI $F_V$的信息, 以构建一个具有类似功能的代金模型 $F_A$。 最近的研究显示,通过图像分类和 NLP 模型, 成功的模型提取攻击成功。 在本文中, 我们展示了针对真实世界的基因对抗网络( GAN) 图像翻译模型的第一次模型提取攻击。 我们提出了一个针对图像翻译模型进行模型提取攻击的框架, 并显示对手能够成功提取功能性代理模型 $F_V$F 。 对手不需要知道 $F$V$的架构或任何有关它的其他信息, 具有类似功能。 最近的研究显示, 与 $F_V$培训数据来自同一领域的数据。 我们用三种不同的图像翻译实例来评估我们的攻击效果:(1) 自我向Anime3 和 (2) Monet-Photo 模型可以成功提取功能模型( IM 格式转换到 ) 自我定位到 AL- Ral- imal- imal- imal- imal imal imation imationsal impeal imations) imation imations impal impalalalalalalalalalalalalalalal imations: