认真对待你所希望的:在 " 采掘经过训练的模型 " 上 (Careful What You Wish For: on the Extraction of Adversarially Trained Models)

from arxiv, To be published in the proceedings of the 19th Annual International Conference on Privacy, Security & Trust (PST 2022). The conference proceedings will be included in IEEE Xplore as in previous editions of the conference

Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for security purposes. In this paper, we propose a framework to assess extraction attacks on adversarially trained models with vision datasets. To the best of our knowledge, our work is the first to perform such evaluation. Through an extensive empirical study, we demonstrate that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances. They can achieve up to $\times1.2$ higher accuracy and agreement with a fraction lower than $\times0.75$ of the queries. We additionally find that the adversarial robustness capability is transferable through extraction attacks, i.e., extracted Deep Neural Networks (DNNs) from robust models show an enhanced accuracy to adversarial examples compared to extracted DNNs from naturally trained (i.e. standard) models.

翻译：最近对机器学习(ML)模型的攻击,如利用对抗性实例和通过抽取攻击的偷盗模型的逃袭袭击,构成了若干安全和隐私威胁。先前的工作提议使用对抗性培训,确保模型能从可能逃避模型分类并降低其性能的对抗性实例中获得保障模式。然而,这种保护技术影响模型的决定界限及其预测概率,因此可能会增加模型隐私风险。事实上,只使用对模型预测输出的查询访问的恶意用户可以提取模型,并获得高度准确性和高度忠诚的代金模型。要获得更多的提取,这些攻击利用了受害者模型的预测概率。事实上,以前关于抽取攻击的所有工作都没有考虑到为安全目的培训过程的变化。在本文件中,我们提出了一个框架,用以评估对有视觉数据集的敌对性训练模型的抽取攻击。根据我们的知识,我们的工作是首先进行这种评价。通过广泛的实证研究,我们证明经过对抗性训练的模型比在自然培训环境下获得的模型更容易提取攻击。它们能够达到一个比美元-美元-美元-美元-美元-期间的可转让性检索能力,从一个比美元-美元-美元-美元-美元-时间查询更精确的精确度的提取能力,我们的工作是第一个进行更精确的计算。一个比一个比一个比一个更精确的精确的提取性模型,我们更精确的计算。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日