机器学习模型所有权解决方案的虚假声明 (False Claims against Model Ownership Resolution)

Deep neural network (DNN) models are valuable intellectual property of model owners, constituting a competitive advantage. Therefore, it is crucial to develop techniques to protect against model theft. Model ownership resolution (MOR) is a class of techniques that can deter model theft. A MOR scheme enables an accuser to assert an ownership claim for a suspect model by presenting evidence, such as a watermark or fingerprint, to show that the suspect model was stolen or derived from a source model owned by the accuser. Most of the existing MOR schemes prioritize robustness against malicious suspects, ensuring that the accuser will win if the suspect model is indeed a stolen model. In this paper, we show that common MOR schemes in the literature are vulnerable to a different, equally important but insufficiently explored, robustness concern: a malicious accuser. We show how malicious accusers can successfully make false claims against independent suspect models that were not stolen. Our core idea is that a malicious accuser can deviate (without detection) from the specified MOR process by finding (transferable) adversarial examples that successfully serve as evidence against independent suspect models. To this end, we first generalize the procedures of common MOR schemes and show that, under this generalization, defending against false claims is as challenging as preventing (transferable) adversarial examples. Via systematic empirical evaluation we demonstrate that our false claim attacks always succeed in all prominent MOR schemes with realistic configurations, including against a real-world model: Amazon's Rekognition API.

翻译：深度神经网络（DNN）模型是模型拥有者的有价值的知识产权，构成了竞争优势。因此，开发技术以防止模型被盗是至关重要的。模型所有权解决方案（MOR）是一类可阻止模型盗窃的技术。MOR方案使得原告可以通过呈现证据（如水印或指纹），来表明被告模型是被盗或派生自原告拥有的源模型。大多数现有的MOR方案优先考虑防范有恶意的被告，确保在被告模型确实是被盗模型的情况下原告必胜。本文中，我们表明文献中的常见MOR方案容易遭受假的原告的攻击，这是一个同样重要但尚未充分研究的鲁棒性问题。我们展示了恶意原告如何成功地对独立被告模型提出虚假的权利主张，而这些模型并没有被盗。我们的核心思想是，恶意原告可以通过找到（可转移的）对抗样本，从而偏离（不被检测到）指定的MOR过程，这些对抗样本可以成功作为针对独立被告模型的证据。为此，我们首先推广了常见的MOR方案的程序，并表明在这个推广的概念下，防范假的主张与防止（可转移的）对抗样本一样具有挑战性。通过系统的实验评估，我们证明了我们的假成分攻击总是能够成功地攻击所有具有现实配置的突出的MOR方案，包括对亚马逊的Rekognition API这一真实世界模型。

相关内容

对抗样本

关注 13

对抗样本由Christian Szegedy等人提出，是指在数据集中通过故意添加细微的干扰所形成的输入样本，导致模型以高置信度给出一个错误的输出。在正则化背景下，通过对抗训练减少原有独立同分布的测试集的错误率——在对抗扰动的训练集样本上训练网络。对抗样本是指通过在数据中故意添加细微的扰动生成的一种输入样本，能够导致神经网络模型给出一个错误的预测结果。实质：对抗样本是通过向输入中加入人类难以察觉的扰动生成，能够改变人工智能模型的行为。其基本目标有两个，一是改变模型的预测结果；二是加入到输入中的扰动在人类看起来不足以引起模型预测结果的改变，具有表面上的无害性。对抗样本的相关研究对自动驾驶、智能家居等应用场景具有非常重要的意义。

【MIT博士论文】对对抗样本和后门攻击鲁棒的机器学习模型

专知会员服务

47+阅读 · 2023年3月31日

战争武装冲突时期的隐私权和数据保护，333页pdf

专知会员服务

16+阅读 · 2022年6月24日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日