The wide-spread availability of rich data has fueled the growth of machine learning applications in numerous domains. However, growth in domains with highly-sensitive data (e.g., medical) is largely hindered as the private nature of data prohibits it from being shared. To this end, we propose Gradient-sanitized Wasserstein Generative Adversarial Networks (GS-WGAN), which allows releasing a sanitized form of the sensitive data with rigorous privacy guarantees. In contrast to prior work, our approach is able to distort gradient information more precisely, and thereby enabling training deeper models which generate more informative samples. Moreover, our formulation naturally allows for training GANs in both centralized and federated (i.e., decentralized) data scenarios. Through extensive experiments, we find our approach consistently outperforms state-of-the-art approaches across multiple metrics (e.g., sample quality) and datasets.
翻译:丰富的数据的广泛可得性促进了许多领域机器学习应用的增长,然而,具有高度敏感数据的领域(如医学)的增长在很大程度上受到阻碍,因为数据的私人性质禁止分享这些数据。为此,我们提议采用 " 渐进式 " 卫生瓦森斯坦瓦西斯坦基因反对网络(GS-WGAN)(GS-WGAN),这种网络允许在严格的隐私保障下释放一种经过清洁的敏感数据形式。与以前的工作不同,我们的方法能够更准确地扭曲梯度信息,从而使得培训更深的模型能够产生更多的信息样本。此外,我们的配方自然允许在中央和联邦(即分散的)数据设想中培训全球网络。通过广泛的实验,我们发现我们的方法始终超越了多种计量(例如抽样质量)和数据集的先进方法。