Machine learning tools are becoming increasingly powerful and widely used. Unfortunately membership attacks, which seek to uncover information from data sets used in machine learning, have the potential to limit data sharing. In this paper we consider an approach to increase the privacy protection of data sets, as applied to face recognition. Using an auxiliary face recognition model, we build on the StyleGAN generative adversarial network and feed it with latent codes combining two distinct sub-codes, one encoding visual identity factors, and, the other, non-identity factors. By independently varying these vectors during image generation, we create a synthetic data set of fictitious face identities. We use this data set to train a face recognition model. The model performance degrades in comparison to the state-of-the-art of face verification. When tested with a simple membership attack our model provides good privacy protection, however the model performance degrades in comparison to the state-of-the-art of face verification. We find that the addition of a small amount of private data greatly improves the performance of our model, which highlights the limitations of using synthetic data to train machine learning models.
翻译:机器学习工具正在变得越来越强大和被广泛使用。 不幸的是,会员攻击试图从机器学习中使用的数据集中发现信息,因此有可能限制数据共享。 在本文中,我们考虑一种提高数据集隐私保护的方法,如用于面部识别。使用辅助面部识别模型,我们以SteleGAN基因对抗网络为基础,并以潜在代码为它提供,其中结合了两种不同的子代码,一种编码视觉识别因素,另一种非身份因素。在图像生成过程中,通过独立地区分这些矢量,我们创建了一套假面部身份的合成数据集。我们用这个数据集来培训一个面部识别模型。模型的性能与面部核查的状态相比会下降。在用简单的成员测试我们模型时,我们提供了良好的隐私保护,但是模型性能与面部验证的状态相比会下降。我们发现,增加少量的私人数据极大地改善了我们模型的性能,这凸显了使用合成数据来培训机器学习模型的局限性。