Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.
翻译:在医疗诊断、生活方式预测和商业决定等对隐私敏感的领域越来越多地使用机器学习技术(ML),这突出表明需要更好地了解这些ML技术是否正在引入敏感和专有培训数据的泄漏。在本文件中,我们侧重于模型反向攻击,即对手对培训数据中的记录了解非敏感属性,目的是推断对手不知道的敏感属性的价值,仅使用黑盒访问目标分类模式。我们首先设计了一种新的基于信任的反向模型,即基于信任的推理攻击,大大优于最新数据。然后我们引入了只使用标签的反向攻击模式,仅依赖模型预测的标签,但在攻击效力方面仍然与我们基于信任的分数攻击相符。我们还将攻击扩大到对手不了解目标记录中某些其他(不敏感)属性的情景。我们评估了我们对两种类型的机器学习模型、决策树和深神经网络的攻击,这些袭击大大优于最新数据集。此外,我们从实验上展示了模型在变换模型攻击中的脆弱性,在性别方面可以进行更具体的组。