Machine learning models deployed as a service (MLaaS) are susceptible to model stealing attacks, where an adversary attempts to steal the model within a restricted access framework. While existing attacks demonstrate near-perfect clone-model performance using softmax predictions of the classification network, most of the APIs allow access to only the top-1 labels. In this work, we show that it is indeed possible to steal Machine Learning models by accessing only top-1 predictions (Hard Label setting) as well, without access to model gradients (Black-Box setting) or even the training dataset (Data-Free setting) within a low query budget. We propose a novel GAN-based framework that trains the student and generator in tandem to steal the model effectively while overcoming the challenge of the hard label setting by utilizing gradients of the clone network as a proxy to the victim's gradients. We propose to overcome the large query costs associated with a typical Data-Free setting by utilizing publicly available (potentially unrelated) datasets as a weak image prior. We additionally show that even in the absence of such data, it is possible to achieve state-of-the-art results within a low query budget using synthetically crafted samples. We are the first to demonstrate the scalability of Model Stealing in a restricted access setting on a 100 class dataset as well.
翻译:作为服务(MLAaaS)部署的机器学习模型很容易被模拟盗窃攻击,敌对方试图在有限访问框架内窃取模型。虽然现有的攻击显示使用分类网络软式预测,几乎完美无缺的克隆模型性能,但大多数API只允许访问顶层-1标签。在这项工作中,我们表明确实有可能窃取机器学习模型,只访问头一级预测(Hard Label设置),以及无法在低查询预算范围内访问模型梯度(Black-Box设置),甚至连培训数据集(Data-Free设置)都无法使用。我们提议建立一个新的GAN框架,通过使用克隆网络的梯度作为受害者梯度的替代来克服硬标签设置的挑战。我们提议通过利用公开(潜在不相干)数据集作为以前的薄弱图像来克服与典型的无数据设置有关的大问询费用。我们进一步表明,即使缺少这种数据,我们提出的GAN-AN框架可以同时培训学生和发电机一起有效窃取模型,同时克服硬标签的挑战,同时使用克隆网络的梯度作为受害者梯度。我们提议用100级的合成模型的样本,可以首先在模拟上展示一个低级访问。