A membership inference attack (MIA) against a machine-learning model enables an attacker to determine whether a given data record was part of the model's training data or not. In this paper, we provide an in-depth study of the phenomenon of disparate vulnerability against MIAs: unequal success rate of MIAs against different population subgroups. We first establish necessary and sufficient conditions for MIAs to be prevented, both on average and for population subgroups, using a notion of distributional generalization. Second, we derive connections of disparate vulnerability to algorithmic fairness and to differential privacy. We show that fairness can only prevent disparate vulnerability against limited classes of adversaries. Differential privacy bounds disparate vulnerability but can significantly reduce the accuracy of the model. We show that estimating disparate vulnerability to MIAs by na\"ively applying existing attacks can lead to overestimation. We then establish which attacks are suitable for estimating disparate vulnerability, and provide a statistical framework for doing so reliably. We conduct experiments on synthetic and real-world data finding statistically significant evidence of disparate vulnerability in realistic settings.
翻译:对机械学习模型的会员推断攻击(MIA)使攻击者能够确定某一数据记录是否是该模型培训数据的一部分。在本文中,我们深入研究了对MIA的差别脆弱性现象:MIA对不同人口分组的不平等成功率。我们首先利用分布性一般化的概念,为平均和人口分组防止MIA创造必要和充分的条件。第二,我们从不同的脆弱程度与算法公正和隐私差异之间获取联系。我们表明,公平只能防止对有限类别的对手的不同脆弱程度。不同的隐私将不同的脆弱性捆绑在一起,但可以大大降低模型的准确性。我们表明,通过使用现有攻击来估计对MIA的不同脆弱程度可导致过高估计。然后,我们确定哪些攻击适合估计不同的脆弱性,并为如此可靠地开展工作提供一个统计框架。我们对合成数据和真实世界数据进行实验,找出在现实环境中不同脆弱程度的统计重要证据。