Recently, it has been shown that Machine Learning models can leak sensitive information about their training data. This information leakage is exposed through membership and attribute inference attacks. Although many attack strategies have been proposed, little effort has been made to formalize these problems. We present a novel formalism, generalizing membership and attribute inference attack setups previously studied in the literature and connecting them to memorization and generalization. First, we derive a universal bound on the success rate of inference attacks and connect it to the generalization gap of the target model. Second, we study the question of how much sensitive information is stored by the algorithm about its training set and we derive bounds on the mutual information between the sensitive attributes and model parameters. Experimentally, we illustrate the potential of our approach by applying it to both synthetic data and classification tasks on natural images. Finally, we apply our formalism to different attribute inference strategies, with which an adversary is able to recover the identity of writers in the PenDigits dataset.
翻译:最近,人们已经表明,机器学习模式可以泄露有关其培训数据的敏感信息。这种信息泄漏是通过会员身份和属性推论攻击而暴露的。虽然提出了许多攻击战略,但很少努力将这些问题正规化。我们展示了一种新的形式主义,将会员普遍性和属性推论攻击装置与文献中以前研究过的这些装置联系起来,并将其与记忆化和概括化联系起来。首先,我们对推论攻击的成功率有一个普遍的约束,并将其与目标模型的概括化差距联系起来。第二,我们研究如何用算法存储有关其培训成套信息的敏感程度,我们从敏感属性和模型参数之间的相互信息中获取界限。我们实验性地通过将它应用于合成数据和自然图像的分类任务来说明我们的方法的潜力。最后,我们运用我们的正式主义来应用不同的推论策略,通过这种策略,对手能够恢复PenDigits数据集中作家的身份。</s>