Machine learning models are typically made available to potential client users via inference APIs. Model extraction attacks occur when a malicious client uses information gleaned from queries to the inference API of a victim model $F_V$ to build a surrogate model $F_A$ with comparable functionality. Recent research has shown successful model extraction of image classification, and natural language processing models. In this paper, we show the first model extraction attack against real-world generative adversarial network (GAN) image translation models. We present a framework for conducting such attacks, and show that an adversary can successfully extract functional surrogate models by querying $F_V$ using data from the same domain as the training data for $F_V$. The adversary need not know $F_V$'s architecture or any other information about it beyond its intended task. We evaluate the effectiveness of our attacks using three different instances of two popular categories of image translation: (1) Selfie-to-Anime and (2) Monet-to-Photo (image style transfer), and (3) Super-Resolution (super resolution). Using standard performance metrics for GANs, we show that our attacks are effective. Furthermore, we conducted a large scale (125 participants) user study on Selfie-to-Anime and Monet-to-Photo to show that human perception of the images produced by $F_V$ and $F_A$ can be considered equivalent, within an equivalence bound of Cohen's d = 0.3. Finally, we show that existing defenses against model extraction attacks (watermarking, adversarial examples, poisoning) do not extend to image translation models.
翻译:典型的提取攻击模式,当恶意客户使用从询问到受害人模型的推断值 $F_V$的数据,从受害人模型的查询到推断值 ATI $F_V$,以构建一个具有可比功能的代金模型 $F_A$。最近的研究表明,图像分类和自然语言处理模型的模型提取模型获得了成功。在本文中,我们展示了针对现实世界的基因对抗网络(GAN)图像翻译模型的第一次模型提取攻击。我们为进行这种攻击提供了一个框架,并展示了进行这种攻击的对手能够成功地提取功能替代模型,通过使用与F_V$培训数据相同的域数据查询$F_V$。敌人不需要了解F_V$的架构或超出其预期任务范围的其他信息。我们用两种流行图像翻译的三种不同实例来评估我们的攻击效果:(1) Sefieto-Anime和(2) Monet-toto图像(模拟风格转换),以及(3)超级分辨率(超级分辨率分辨率),使用标准性能测量到GANS-A的模型,我们向用户展示了一种大规模攻击的自我定位模型。</s>