关于数据集推断的强度 (On the Robustness of Dataset Inference)

Machine learning (ML) models are costly to train as they can require a significant amount of data, computational resources and technical expertise. Thus, they constitute valuable intellectual property that needs protection from adversaries wanting to steal them. Ownership verification techniques allow the victims of model stealing attacks to demonstrate that a suspect model was in fact stolen from theirs. Although a number of ownership verification techniques based on watermarking or fingerprinting have been proposed, most of them fall short either in terms of security guarantees (well-equipped adversaries can evade verification) or computational cost. A fingerprinting technique introduced at ICLR '21, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models. We then confirm empirically that DI leads to FPs, with high confidence. Second, we show that DI also suffers from false negatives (FNs) -- an adversary can fool DI by regularising a stolen model's decision boundaries using adversarial training, thereby leading to an FN. To this end, we demonstrate that DI fails to identify a model adversarially trained from a stolen dataset -- the setting where DI is the hardest to evade. Finally, we discuss the implications of our findings, the viability of fingerprinting-based ownership verification in general, and suggest directions for future work.

翻译：机器学习(ML)模型需要大量的数据、计算资源和技术专门知识,因此,这些模型是宝贵的知识产权,需要保护,免受想要偷取这些数据的对手的侵害。所有权核查技术使模型盗窃袭击的受害者能够证明一个可疑模型实际上是从他们身上窃取的。虽然提出了基于水标记或指纹的一些所有权核查技术,但大多数在安全保障(装备精良的对手可以逃避核查)或计算成本方面都存在缺陷。在ICLR '21, 数据集推断(DI)中引入的指纹技术,比以前的方法更可靠、效率更高。所有权核查技术的作者们为线性(疑点)模型提供了正确性证据。然而,在同一背景下,我们证明,基于水标记或指纹的一些所有权核查技术存在很高的假阳性(FPs),它们大多在安全保障措施(装备精良性对手可以回避核查)或计算成本方面都存在缺陷。我们进一步证明,在现实的、非线假设模型中也触发了FPseral 。我们从经验性地证实,在常规的核查中可以证明一个错误的模型可以显示一个未来数据。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日