We study black-box model stealing attacks where the attacker can query a machine learning model only through publicly available APIs. Specifically, our aim is to design a black-box model extraction attack that uses minimal number of queries to create an informative and distributionally equivalent replica of the target model. First, we define distributionally equivalent and max-information model extraction attacks. Then, we reduce both the attacks into a variational optimisation problem. The attacker solves this problem to select the most informative queries that simultaneously maximise the entropy and reduce the mismatch between the target and the stolen models. This leads us to an active sampling-based query selection algorithm, Marich. We evaluate Marich on different text and image data sets, and different models, including BERT and ResNet18. Marich is able to extract models that achieve $69-96\%$ of true model's accuracy and uses $1,070 - 6,950$ samples from the publicly available query datasets, which are different from the private training datasets. Models extracted by Marich yield prediction distributions, which are $\sim2-4\times$ closer to the target's distribution in comparison to the existing active sampling-based algorithms. The extracted models also lead to $85-95\%$ accuracy under membership inference attacks. Experimental results validate that Marich is query-efficient, and also capable of performing task-accurate, high-fidelity, and informative model extraction.
翻译:我们研究黑箱模式偷窃攻击,攻击者只能通过公开提供的API来查询机器学习模式。 具体地说, 我们的目标是设计黑箱模式抽取攻击, 使用最少的查询数量来创建信息化和分布等效的目标模型复制。 首先, 我们定义分布等值和最大信息模式抽取攻击。 然后, 我们将攻击分为一个变式优化问题。 攻击者解决问题, 选择信息最丰富的查询, 同时最大化导体, 减少目标与被盗模型之间的不匹配。 这导致我们找到一个以抽样为基础的主动查询选择算法, Marich。 我们对不同的文本和图像数据集以及不同模型( 包括 BERT 和 ResNet18) 进行评估。 马里希能够提取模型, 使真实模型准确性达到69- 960- 6 950美元, 从公开提供的查询数据集中提取样本, 这与私人培训数据集不同。 由Matrich 收益预测发行的模型, 也是以Sim2-4- 4-time为基的抽样选择算算法, 与目标的精准性攻击的精确度比例对比。