Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in sensitive domains. Unfortunately, restricted by the inherent complexity of modeling high-dimensional distributions, existing private generative models are struggling with the utility of synthetic samples. In contrast to existing works that aim at fitting the complete data distribution, we directly optimize for a small set of samples that are representative of the distribution under the supervision of discriminative information from downstream tasks, which is generally an easier task and more suitable for private training. Our work provides an alternative view for differentially private generation of high-dimensional data and introduces a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
翻译:不同的私人数据生成技术已成为解决数据隐私挑战的有希望的办法 -- -- 它使数据共享成为可行的办法,同时遵守严格的隐私保障,这对敏感领域的科学进步至关重要。不幸的是,由于高维分布模型的内在复杂性,现有的私人基因变异模型在与合成样品的实用性作斗争。 与旨在安装完整数据分布的现有工作相比,我们直接优化了代表下游任务歧视性信息传播的一小套样本,这通常是一项比较容易的任务,更适合私人培训。我们的工作为以不同方式私下生成高维数据提供了另一种观点,并引入了简单而有效的方法,大大改进了最先进方法的样本效用。