Recently, the membership inference attack poses a serious threat to the privacy of confidential training data of machine learning models. This paper proposes a novel adversarial example based privacy-preserving technique (AEPPT), which adds the crafted adversarial perturbations to the prediction of the target model to mislead the adversary's membership inference model. The added adversarial perturbations do not affect the accuracy of target model, but can prevent the adversary from inferring whether a specific data is in the training set of the target model. Since AEPPT only modifies the original output of the target model, the proposed method is general and does not require modifying or retraining the target model. Experimental results show that the proposed method can reduce the inference accuracy and precision of the membership inference model to 50%, which is close to a random guess. Further, for those adaptive attacks where the adversary knows the defense mechanism, the proposed AEPPT is also demonstrated to be effective. Compared with the state-of-the-art defense methods, the proposed defense can significantly degrade the accuracy and precision of membership inference attacks to 50% (i.e., the same as a random guess) while the performance and utility of the target model will not be affected.
翻译:最近,会籍推断攻击对机床学习模型的保密培训数据隐私构成了严重威胁。本文件提出一种新的基于隐私保护技术(APPT)的对抗性范例,将精心设计的对抗性扰动添加到预测目标模型的预测中,以误导对手的会籍推断模型。增加的对抗性扰动并不影响目标模型的准确性,但可以防止对手推断目标模型的培训数据集中是否有特定数据。由于AEPPT只修改目标模型的原始产出,拟议方法是一般性的,不需要修改或再培训目标模型。实验结果显示,拟议方法可以将成员推导模型的推论准确性和精确度降低到50%,这接近于随机猜测。此外,对于敌人了解防御机制的适应性攻击,拟议的APPT也证明是有效的。与最新防御方法相比,拟议防御方法可以大大降低成员推论攻击的准确性和准确性,无需修改或再培训目标模型。 实验结果表明,拟议的方法可以将成员推导模型的准确性和精确性降低到50%(i.),而不影响目标的性性性效果将是随机性。