Data privacy has become an increasingly important issue in Machine Learning (ML), where many approaches have been developed to tackle this challenge, e.g. cryptography (Homomorphic Encryption (HE), Differential Privacy (DP), etc.) and collaborative training (Secure Multi-Party Computation (MPC), Distributed Learning and Federated Learning (FL)). These techniques have a particular focus on data encryption or secure local computation. They transfer the intermediate information to the third party to compute the final result. Gradient exchanging is commonly considered to be a secure way of training a robust model collaboratively in Deep Learning (DL). However, recent researches have demonstrated that sensitive information can be recovered from the shared gradient. Generative Adversarial Network (GAN), in particular, has shown to be effective in recovering such information. However, GAN based techniques require additional information, such as class labels which are generally unavailable for privacy-preserved learning. In this paper, we show that, in the FL system, image-based privacy data can be easily recovered in full from the shared gradient only via our proposed Generative Regression Neural Network (GRNN). We formulate the attack to be a regression problem and optimize two branches of the generative model by minimizing the distance between gradients. We evaluate our method on several image classification tasks. The results illustrate that our proposed GRNN outperforms state-of-the-art methods with better stability, stronger robustness, and higher accuracy. It also has no convergence requirement to the global FL model. Moreover, we demonstrate information leakage using face re-identification. Some defense strategies are also discussed in this work.
翻译:在机器学习(ML)中,数据隐私已成为一个越来越重要的问题,在机器学习(ML)中,已经制定了许多方法来应对这一挑战,例如加密(HE)、差异隐私(DP)等)和协作培训(Secure 多党计算(MPC)、分布式学习和联邦学习(FL)),这些技术特别侧重于数据加密或安全本地计算。它们将中间信息转移给第三方,以计算最终结果。人们通常认为,在深层学习(DL)中,渐进式交换是培训一个强有力的模型的可靠方法。然而,最近的研究表明,从共享的梯度中可以回收敏感信息。General Aversarial网络(GAN)尤其表明,在恢复这类信息方面是有效的。然而,基于GAN的技术需要额外信息,例如通常无法用于保密学习的类标签。在本文的模型中,我们表明,基于图像的隐私数据可以很容易从共享的梯度中完全恢复,只有通过我们提议的更高级的回归要求(DL)面的精确度要求(DL),而最近的隐私要求也显示敏感信息从共同的精确度恢复(GNEAR)网络中,我们用两个最精确的方法来进行系统。我们最精确的升级的方法进行。我们在一个更精确的升级的方法对一个更精确的变。