How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.
翻译:机器学习算法对其培训数据及其原因泄漏了多少?? 成员推断攻击被作为一种审计工具用于量化这种泄漏。 在本文中,我们提出了一个全面的\ textit{hpothesis 测试框架},它不仅使我们能够以一致的方式正式表达先前的工作,而且能够设计新的成员推断攻击,使用参考模型实现任何(假正率)错误的更高权力(真正的正率)。更重要的是,我们解释不同攻击的不同表现。我们为难以分辨的游戏提供了一个模板,并提供了对不同游戏中攻击成功率的解释。我们讨论了问题拟订过程中出现的进攻者的各种不确定性,并展示了我们的方法如何试图将攻击的不确定性降到关于训练组中存在或没有数据点的一个小秘密上。我们对所有攻击类型(假正率)进行一个textit{差别分析。我们对所有攻击进行解释,解释它们之间的差距,并表明是什么使得数据点易受攻击的原因(原因各不相同,原因是由于可获取的磁力度分析框架不同, 也就是我们可理解的软件格式框架是公开的模版) 。