Previous studies have verified that the functionality of black-box models can be stolen with full probability outputs. However, under the more practical hard-label setting, we observe that existing methods suffer from catastrophic performance degradation. We argue this is due to the lack of rich information in the probability prediction and the overfitting caused by hard labels. To this end, we propose a novel hard-label model stealing method termed \emph{black-box dissector}, which consists of two erasing-based modules. One is a CAM-driven erasing strategy that is designed to increase the information capacity hidden in hard labels from the victim model. The other is a random-erasing-based self-knowledge distillation module that utilizes soft labels from the substitute model to mitigate overfitting. Extensive experiments on four widely-used datasets consistently demonstrate that our method outperforms state-of-the-art methods, with an improvement of at most $8.27\%$. We also validate the effectiveness and practical potential of our method on real-world APIs and defense methods. Furthermore, our method promotes other downstream tasks, \emph{i.e.}, transfer adversarial attacks.
翻译:先前的研究已经证实黑箱模型的功能可以完全概率产出被偷。然而,在更实用的硬标签设置下,我们观察到现有方法存在灾难性性能退化。我们争辩说,这是因为概率预测缺乏丰富的信息,硬标签造成过大。为此,我们提议采用新的硬标签盗窃模型方法,称为emph{black-box discult},该方法由两个基于淘汰的模块组成。一个是CAM驱动的淘汰战略,旨在增加受害者模型硬标签中隐藏的信息能力。另一个是随机淘汰的自知蒸馏模块,利用替代模型的软标签来减轻过度装配。关于四套广泛使用的数据集的广泛实验不断表明,我们的方法比目前最先进的方法要好,其改进幅度为8.27美元。我们还验证了我们的方法在现实世界API和防御方法上的有效性和实际潜力。此外,我们的方法还促进其他下游任务,即对抗性攻击。