The functionality of a deep learning (DL) model can be stolen via model extraction where an attacker obtains a surrogate model by utilizing the responses from a prediction API of the original model. In this work, we propose a novel watermarking technique called DynaMarks to protect the intellectual property (IP) of DL models against such model extraction attacks in a black-box setting. Unlike existing approaches, DynaMarks does not alter the training process of the original model but rather embeds watermark into a surrogate model by dynamically changing the output responses from the original model prediction API based on certain secret parameters at inference runtime. The experimental outcomes on Fashion MNIST, CIFAR-10, and ImageNet datasets demonstrate the efficacy of DynaMarks scheme to watermark surrogate models while preserving the accuracies of the original models deployed in edge devices. In addition, we also perform experiments to evaluate the robustness of DynaMarks against various watermark removal strategies, thus allowing a DL model owner to reliably prove model ownership.
翻译:深度学习模式( DL) 的功能可以通过模型提取方式被盗, 攻击者通过使用原始模型的预测API的响应而获得替代模型。 在这项工作中, 我们提议了一种名为 DynaMarks 的新颖的水标记技术, 以保护DynaMarks 的知识产权, 防止黑箱环境中的这种模型提取攻击。 与现有的方法不同, DynaMarks 并不改变原始模型的培训过程,而是通过动态改变原始模型预测API 的输出反应, 将水标记嵌入替代模型中, 从而动态地改变原始模型预测API 的输出反应, 其依据是在推断运行时的某些秘密参数。 Fashason MNIST、 CIFAR- 10 和图像网络数据集的实验结果展示了Dynmarks 计划在保存在边缘装置上安装的原始模型的精度的同时对水标记补充模型的功效。 此外, 我们还进行实验, 对照各种水标记清除战略评估Dynmarks 的坚固性, 从而允许 DL 模型所有人可靠地证明模型所有权。