Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.
翻译:神经网络容易被数据推断攻击,如会籍推断攻击、对抗性反向模型攻击和属性推断攻击等数据推断攻击,攻击者可以从目标分类者预测的可信度分数中推断出有用的信息,如会籍、重建或敏感特性等数据样本。在本文中,我们提出一种方法,即PURIFIER,以抵御会籍推断攻击。它改变了目标分类者预测的信心分数矢量,使成员和非成员之间在个人形状、统计分布和预测标签上可以分辨为净化的信任分数。实验结果表明,PURIFIER有助于以高效力和效率为会籍推断攻击辩护,超过以往的防御方法,并造成微不足道的效用损失。此外,我们进一步实验表明,PURIFIER在捍卫对抗性反向攻击模型和推断攻击方面也很有效。例如,Facesrub530分类器的反向错误上升了4+倍,在实验中部署PURIRIER时的属性精确性下降。