How much does a given trained model leak about each individual data record in its training set? Membership inference attacks are used as an auditing tool to quantify the private information that a model leaks about the individual data points in its training set. Membership inference attacks are influenced by different uncertainties that an attacker has to resolve about training data, the training algorithm, and the underlying data distribution. Thus attack success rates, of many attacks in the literature, do not precisely capture the information leakage of models about their data, as they also reflect other uncertainties that the attack algorithm has. In this paper, we explain the implicit assumptions and also the simplifications made in prior work using the framework of hypothesis testing. We also derive new attack algorithms from the framework that can achieve a high AUC score while also highlighting the different factors that affect their performance. Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models. We provide a thorough empirical evaluation of our attack strategies on various machine learning tasks and benchmark datasets.
翻译:成员推断攻击被用作一种审计工具,以量化某个模型泄露的私人信息; 成员推断攻击受到攻击者在培训数据、培训算法和基本数据分布方面必须解决的不同不确定性的影响。 因此,在文献中的许多攻击中,攻击成功率不能准确地捕捉到有关其数据模型的信息泄漏,因为它们也反映了攻击算法的其他不确定性。 在本文中,我们解释了隐含的假设以及先前工作中使用假设测试框架所作的简化。我们还从能够达到高ACU评分的框架中得出新的攻击算法,同时也强调了影响其性能的不同因素。我们的算法在模型中非常精确地反映了隐私损失的近似值,并且可以用作在机器学习模型中准确和知情地估计隐私风险的工具。我们对各种机器学习任务和基准数据集的进攻战略进行了彻底的经验评估。