Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.
翻译:隐私敏感数据培训机器学习模式已成为一种流行做法,在不断扩大的领域推动创新,这为新的攻击打开了大门,而新的攻击可能具有严重的隐私影响。其中一次攻击,即成员推断攻击(MIA),暴露了是否使用某一数据点来培训模型。越来越多的文献使用差异私人(DP)培训算法作为防范这类攻击的防御手段。然而,这些工程根据以下限制性假设对辩护进行评价:训练组所有成员和非成员都是独立和分布相同的。这一假设对文献中许多真实世界使用的案例没有起到作用。受此影响,我们评估了抽样中统计依赖性的成员推论,并解释了为什么DP在这一更为笼统的案例中没有提供有意义的保护(隐私参数值为美元,培训定值为美元)。我们利用从实际世界数据中建立的不同类型显示不同样本的模型对现成MIA进行一系列实证评估。我们发现,培训的可靠性可以极大地提高MIA的性能。因此,我们所设定的统计依赖性能大大增强MIA的统计性能。