A membership inference attack (MIA) poses privacy risks for the training data of a machine learning model. With an MIA, an attacker guesses if the target data are a member of the training dataset. The state-of-the-art defense against MIAs, distillation for membership privacy (DMP), requires not only private data for protection but a large amount of unlabeled public data. However, in certain privacy-sensitive domains, such as medicine and finance, the availability of public data is not guaranteed. Moreover, a trivial method for generating public data by using generative adversarial networks significantly decreases the model accuracy, as reported by the authors of DMP. To overcome this problem, we propose a novel defense against MIAs that uses knowledge distillation without requiring public data. Our experiments show that the privacy protection and accuracy of our defense are comparable to those of DMP for the benchmark tabular datasets used in MIA research, Purchase100 and Texas100, and our defense has a much better privacy-utility trade-off than those of the existing defenses that also do not use public data for the image dataset CIFAR10.
翻译:成员推论攻击(MIA)对机器学习模型的培训数据构成隐私风险。 使用MIA,攻击者猜测目标数据是否是培训数据集的成员。 最先进的针对MIA的防御方法,即对会员隐私的提炼(DMP),不仅需要私人数据来保护,而且需要大量未贴标签的公共数据。 但是,在某些隐私敏感领域,如医药和金融领域,公共数据的提供得不到保障。 此外,使用基因对抗网络生成公共数据的琐碎方法大大降低了模型的准确性,正如DMP的作者所报告的那样。 为解决这一问题,我们提议对使用知识蒸馏而无需公共数据的MIA进行新的防御。 我们的实验表明,我们国防的隐私保护和准确性与MIA研究、Pake100和德克萨斯100中使用的基准表格数据集的DMP相仿。 我们的国防的隐私保护和准确性比现有防御系统中不使用公共数据用于图像数据集的保密性交易要好得多。