We study the membership inference (MI) attack against classifiers, where the attacker's goal is to determine whether a data instance was used for training the classifier. Through systematic cataloging of existing MI attacks and extensive experimental evaluations of them, we find that a model's vulnerability to MI attacks is tightly related to the generalization gap -- the difference between training accuracy and test accuracy. We then propose a defense against MI attacks that aims to close the gap by intentionally reduces the training accuracy. More specifically, the training process attempts to match the training and validation accuracies, by means of a new {\em set regularizer} using the Maximum Mean Discrepancy between the softmax output empirical distributions of the training and validation sets. Our experimental results show that combining this approach with another simple defense (mix-up training) significantly improves state-of-the-art defense against MI attacks, with minimal impact on testing accuracy.
翻译:我们研究成员对分类人员的攻击(MI),攻击者的目标是确定是否使用了数据实例来训练分类人员。通过系统编目现有的MI攻击事件和对其进行广泛的实验性评估,我们发现模型对MI攻击的脆弱性与普遍化差距密切相关 -- -- 培训准确性和测试准确性之间的差别。然后我们提出针对MI攻击的防御,目的是通过故意降低培训的准确性来缩小差距。更具体地说,培训过程试图通过使用培训和验证组合的软max输出实验分布之间的最大平均值偏差来匹配培训和验证组合。我们的实验结果显示,将这一方法与另一种简单的防御(混合培训)相结合,极大地改进了针对MI攻击的先进防御状态,对测试准确性的影响最小。