Personal data collected at scale promises to improve decision-making and accelerate innovation. However, sharing and using such data raises serious privacy concerns. A promising solution is to produce synthetic data, artificial records to share instead of real data. Since synthetic records are not linked to real persons, this intuitively prevents classical re-identification attacks. However, this is insufficient to protect privacy. We here present TAPAS, a toolbox of attacks to evaluate synthetic data privacy under a wide range of scenarios. These attacks include generalizations of prior works and novel attacks. We also introduce a general framework for reasoning about privacy threats to synthetic data and showcase TAPAS on several examples.
翻译:大规模收集的个人数据有可能改善决策并加速创新。然而,分享和使用这类数据会引起严重的隐私问题。一个有希望的解决办法是制作合成数据、人工记录以分享而不是真实数据。由于合成记录与真实人没有联系,这直觉地防止了古老的重新认同攻击;然而,这不足以保护隐私。我们在这里介绍TAPAS,这是在各种情景下评估合成数据隐私的一个攻击工具箱。这些攻击包括以前工作的一般化和新颖攻击。我们还引入了对合成数据隐私威胁进行推理的一般框架,并以若干实例展示TAPAS。