Model extraction attacks have become serious issues for service providers using machine learning. We consider an adversarial setting to prevent model extraction under the assumption that attackers will make their best guess on the service provider's model using query accesses, and propose to build a surrogate model that significantly keeps away the predictions of the attacker's model from those of the true model. We formulate the problem as a non-convex constrained bilevel optimization problem and show that for kernel models, it can be transformed into a non-convex 1-quadratically constrained quadratic program with a polynomial-time algorithm to find the global optimum. Moreover, we give a tractable transformation and an algorithm for more complicated models that are learned by using stochastic gradient descent-based algorithms. Numerical experiments show that the surrogate model performs well compared with existing defense models when the difference between the attacker's and service provider's distributions is large. We also empirically confirm the generalization ability of the surrogate model.
翻译:对于使用机器学习的服务提供者来说,模型抽取攻击已成为严重问题。我们考虑一种对抗性环境,以防止模型抽取,假设攻击者将最佳地猜测使用查询接入的服务提供者模型,并提议建立一个替代模型,使攻击者模型的预测与真实模型的预测大相径庭。我们把这个问题当作一个非凝固的双级优化问题来表述,并表明对于内核模型来说,它可以转化成一个非凝固的1-夸大受限制的四方形程序,采用多元时算法来找到全球的最佳方法。此外,我们提供了一种可移植的变换法和一种算法,用于更复杂的模型,这些模型通过使用随机梯度梯度基底算法来学习。数字实验显示,当攻击者和服务提供者的分布差异很大时,代孕模型与现有的防御模型相比效果很好。我们还从经验上确认代孕模型的普遍能力。