Machine learning (ML) models have been deployed for high-stakes applications, e.g., healthcare and criminal justice. Prior work has shown that ML models are vulnerable to attribute inference attacks where an adversary, with some background knowledge, trains an ML attack model to infer sensitive attributes by exploiting distinguishable model predictions. However, some prior attribute inference attacks have strong assumptions about adversary's background knowledge (e.g., marginal distribution of sensitive attribute) and pose no more privacy risk than statistical inference. Moreover, none of the prior attacks account for class imbalance of sensitive attribute in datasets coming from real-world applications (e.g., Race and Sex). In this paper, we propose an practical and effective attribute inference attack that accounts for this imbalance using an adaptive threshold over the attack model's predictions. We exhaustively evaluate our proposed attack on multiple datasets and show that the adaptive threshold over the model's predictions drastically improves the attack accuracy over prior work. Finally, current literature lacks an effective defence against attribute inference attacks. We investigate the impact of fairness constraints (i.e., designed to mitigate unfairness in model predictions) during model training on our attribute inference attack. We show that constraint based fairness algorithms which enforces equalized odds acts as an effective defense against attribute inference attacks without impacting the model utility. Hence, the objective of algorithmic fairness and sensitive attribute privacy are aligned.
翻译:先前的工作表明,ML模型很容易导致推断攻击,因为一个具有某种背景知识的对手,训练ML攻击模型,利用可辨别的模式预测来推断敏感属性。然而,一些先前的属性推断攻击对对手的背景知识(例如敏感属性的边缘分布)有着强烈的假设,并且没有比统计推理更多的隐私风险。此外,以往的攻击中没有哪一个是真实世界应用(例如,种族和性别)数据集敏感属性的等级不平衡的原因。在本文件中,我们提出一个实际而有效的推断攻击,即利用可辨别模型预测的适应性阈值来推断敏感属性。我们详尽地评价了对敌人背景知识(例如,敏感属性的边缘分布)的拟议攻击,并表明对模型预测的适应性阈值大大提高了攻击的准确性。最后,当前文献缺乏对真实性攻击的归属性攻击的有效防御。我们研究了公平性限制因素的影响,而没有进行精确性算法预测,目的是在不精确性攻击的可靠性方面进行客观的预测。我们设计这种分析的目的是要减轻攻击的可靠性的可靠性,我们根据公平性判断进行这样的判断。