关于成员困难推推论的攻击 (On the Difficulty of Membership Inference Attacks)

Recent studies propose membership inference (MI) attacks on deep models, where the goal is to infer if a sample has been used in the training process. Despite their apparent success, these studies only report accuracy, precision, and recall of the positive class (member class). Hence, the performance of these attacks have not been clearly reported on negative class (non-member class). In this paper, we show that the way the MI attack performance has been reported is often misleading because they suffer from high false positive rate or false alarm rate (FAR) that has not been reported. FAR shows how often the attack model mislabel non-training samples (non-member) as training (member) ones. The high FAR makes MI attacks fundamentally impractical, which is particularly more significant for tasks such as membership inference where the majority of samples in reality belong to the negative (non-training) class. Moreover, we show that the current MI attack models can only identify the membership of misclassified samples with mediocre accuracy at best, which only constitute a very small portion of training samples. We analyze several new features that have not been comprehensively explored for membership inference before, including distance to the decision boundary and gradient norms, and conclude that deep models' responses are mostly similar among train and non-train samples. We conduct several experiments on image classification tasks, including MNIST, CIFAR-10, CIFAR-100, and ImageNet, using various model architecture, including LeNet, AlexNet, ResNet, etc. We show that the current state-of-the-art MI attacks cannot achieve high accuracy and low FAR at the same time, even when the attacker is given several advantages. The source code is available at https://github.com/shrezaei/MI-Attack.

翻译：最近的研究显示,人们对深层模型进行成员推论(MI)攻击,目的是推断在培训过程中是否使用了样本。尽管这些研究显然取得了成功,但这些研究只报告准确性、准确性、回顾正面阶级(成员阶级 ) 。因此,这些攻击的表现并没有在负面阶级(非成员阶级)上得到明确报告。在本文中,我们表明,MI攻击表现的报告方式往往具有误导性,因为它们受到高假正率或错误警报率(FAR)的影响,而没有报告。FAR显示,攻击模型常常把非培训样本(非成员)误标为培训(成员),尽管这些研究显然取得了成功,但这些研究只使MI攻击根本上不切实际样本(成员级)的准确性、准确性(FAR-Net)的准确性(FAR-Net ) 的准确性(FAR) 。高高的FAR- 高级FAR- 网络显示, 低度样本/ 高低度的样本。我们分析了一些新的特征, 甚至没有全面探索过会员身份, 包括深度的模型, 高距离和低程。