Artificial intelligence and machine learning have been integrated into all aspects of our lives and the privacy of personal data has attracted more and more attention. Since the generation of the model needs to extract the effective information of the training data, the model has the risk of leaking the privacy of the training data. Membership inference attacks can measure the model leakage of source data to a certain degree. In this paper, we design a privacy-preserving generative framework against membership inference attacks, through the information extraction and data generation capabilities of the generative model variational autoencoder (VAE) to generate synthetic data that meets the needs of differential privacy. Instead of adding noise to the model output or tampering with the training process of the target model, we directly process the original data. We first map the source data to the latent space through the VAE model to get the latent code, then perform noise process satisfying metric privacy on the latent code, and finally use the VAE model to reconstruct the synthetic data. Our experimental evaluation demonstrates that the machine learning model trained with newly generated synthetic data can effectively resist membership inference attacks and still maintain high utility.
翻译:人工智能和机器学习已被纳入我们生活的各个方面,个人数据隐私已引起越来越多的关注。由于模型的生成需要提取培训数据的有效信息,模型有可能泄露培训数据的隐私。成员推论攻击可以在某种程度上测量源数据的模式渗漏。在本文件中,我们设计了一个隐私保护框架,通过基因模型变异自动编码(VAE)的信息提取和数据生成能力,防止会籍推断攻击,以生成满足不同隐私需要的合成数据。我们不是在模型输出中添加噪音,或篡改目标模型的培训过程,而是直接处理原始数据。我们首先通过VAE模型将源数据绘制到潜在空间,然后进行噪音进程,满足潜在代码的立体隐私,最后使用VAE模型重建合成数据。我们的实验性评估表明,经过新生成合成数据培训的机器学习模型能够有效抵制会籍推断攻击并保持高效用。