Machine-learning-as-a-service (MLaaS) has attracted millions of users to their splendid large-scale models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrated that attackers manage to steal or extract the victim models. Nonetheless, none of the previous stolen models can outperform the original black-box APIs. In this work, we conduct unsupervised domain adaptation and multi-victim ensemble to showing that attackers could potentially surpass victims, which is beyond previous understanding of model extraction. Extensive experiments on both benchmark datasets and real-world APIs validate that the imitators can succeed in outperforming the original black-box models on transferred domains. We consider our work as a milestone in the research of imitation attack, especially on NLP APIs, as the superior performance could influence the defense or even publishing strategy of API providers.
翻译:机器学习服务(MLaaaS)吸引了数以百万计的用户使用其出色的大型模型。虽然以黑箱API的形式公布,但这些服务背后的宝贵模型仍然容易受到模仿攻击。最近,一系列著作表明,袭击者设法偷窃或提取受害者模型。然而,以前被盗的模型没有一个能够超过原始黑箱API。在这项工作中,我们进行了不受监督的域适应和多受害者共同行动,以表明袭击者有可能超过受害者,这超出了以往对模型提取的理解。关于基准数据集和真实世界API的大规模实验证实,模仿者能够成功超过转移域的原始黑箱模型。我们认为,我们的工作是模拟攻击研究中的一个里程碑,特别是对NLPAPAPIS的研究,因为优异能影响API提供者的国防甚至出版战略。