从示范解释推断敏感属性 (Inferring Sensitive Attributes from Model Explanations)

Model explanations provide transparency into a trained machine learning model's blackbox behavior to a model builder. They indicate the influence of different input attributes to its corresponding model prediction. The dependency of explanations on input raises privacy concerns for sensitive user data. However, current literature has limited discussion on privacy risks of model explanations. We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., race and sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input. We evaluate our proposed attack on four benchmark datasets and four state-of-the-art algorithms. We show that an adversary can successfully infer the value of sensitive attributes from explanations in both the threat models accurately. Moreover, the attack is successful even by exploiting only the explanations corresponding to sensitive attributes. These suggest that our attack is effective against explanations and poses a practical threat to data privacy. On combining the model predictions (an attack surface exploited by prior attacks) with explanations, we note that the attack success does not improve. Additionally, the attack success on exploiting model explanations is better compared to exploiting only model predictions. These suggest that model explanations are a strong attack surface to exploit for an adversary.

翻译：模型解释使经过训练的机器学习模型的黑盒行为透明化, 表明不同输入属性对其相应模型预测的影响。对投入的依附性引起敏感用户数据的隐私问题。但是, 目前文献对模型解释的隐私风险的讨论有限。我们侧重于属性推断攻击的具体隐私风险, 敌人根据模型解释准确地推断出输入( 如种族和性别) 的敏感属性。我们设计了两种威胁模型对模型解释的第一次属性推断攻击。在两种模型中,模型制造者要么(a) 将敏感属性纳入培训数据和投入,要么(b) 将敏感属性纳入培训数据和投入,从而审查敏感属性,不将它们纳入培训数据和投入。我们评估了对4个基准数据集和4种最新算法的拟议攻击。我们显示,对手能够成功地从两种威胁模型的解释中推断出敏感属性( 如种族和性别)的敏感属性。此外, 我们设计攻击模型的成功, 仅仅利用与敏感属性对应的解释。这表明,我们的攻击对解释有效, 并且对数据隐私构成实际威胁。在将模型与攻击的预测结合起来时, 我们利用之前的地面解释, 改进了对攻击的预测, 改进了对攻击的成功解释。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/