合成数据 -- -- 土拨鼠匿名日 (Synthetic Data -- Anonymisation Groundhog Day) - 专知论文

会员服务 ·

0

生成模型 · INFORMS · Better · state-of-the-art · contrastive ·

2021 年 9 月 22 日

Synthetic Data -- Anonymisation Groundhog Day

翻译：合成数据 -- -- 土拨鼠匿名日

Theresa Stadler,Bristena Oprisanu,Carmela Troncoso

Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data publishing that addresses the shortcomings of traditional anonymisation techniques. The promise is that synthetic data drawn from generative models preserves the statistical properties of the original dataset but, at the same time, provides perfect protection against privacy attacks. In this work, we present the first quantitative evaluation of the privacy gain of synthetic data publishing and compare it to that of previous anonymisation techniques. Our evaluation of a wide range of state-of-the-art generative models demonstrates that synthetic data either does not prevent inference attacks or does not retain data utility. In other words, we empirically show that synthetic data does not provide a better tradeoff between privacy and utility than traditional anonymisation techniques. Furthermore, in contrast to traditional anonymisation, the privacy-utility tradeoff of synthetic data publishing is hard to predict. Because it is impossible to predict what signals a synthetic dataset will preserve and what information will be lost, synthetic data leads to a highly variable privacy gain and unpredictable utility loss. In summary, we find that synthetic data is far from the holy grail of privacy-preserving data publishing.

翻译：合成数据被公诸于世,作为解决传统匿名技术缺陷的隐私保护数据出版的银球解决方案,被公诸于世。其前景是,从基因模型中提取的合成数据保存了原始数据集的统计特性,但同时也提供了完美的保护,防止了隐私受到攻击。在这项工作中,我们首次对合成数据出版的私隐收益进行了定量评估,并将其与先前的匿名技术进行比较。我们对各种最新基因化模型的评估表明,合成数据既不能防止推断攻击,也不能保留数据效用。换句话说,我们从经验上表明,合成数据并没有比传统匿名技术更好地平衡隐私和效用。此外,与传统的匿名化技术相比,合成数据出版的私隐效用交易很难预测。由于无法预测什么是合成数据集的信号,哪些信息会丢失,合成数据会导致高度变异的私隐收益和不可预测的公用损失。简而言之,我们发现合成数据远非隐私保护数据出版的神圣结构。

0

相关内容

生成模型

在机器学习中，生成模型可以用来直接对数据建模（例如根据某个变量的概率密度函数进行数据采样），也可以用来建立变量间的条件概率分布。条件概率分布可以由生成模型根据贝叶斯定理形成。

【数据科学导论书】Introduction to Datascience，253页pdf

【数据科学导论书】Introduction to Datascience，253页pdf

专知会员服务

50+阅读 · 2021年11月15日

【Cell】神经算法推理，Neural algorithmic reasoning

【Cell】神经算法推理，Neural algorithmic reasoning

专知会员服务

29+阅读 · 2021年7月16日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

3+阅读 · 2018年4月10日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Arxiv

0+阅读 · 2021年11月12日

One Step to Efficient Synthetic Data

Arxiv

0+阅读 · 2021年11月12日

Dataset of Philippine Presidents Speeches from 1935 to 2016

Arxiv

0+阅读 · 2021年11月12日

Open Data Fabric: A Decentralized Data Exchange and Transformation Protocol With Complete Reproducibility and Provenance

Open Data Fabric: A Decentralized Data Exchange and Transformation Protocol With Complete Reproducibility and Provenance

Arxiv

2+阅读 · 2021年11月11日

FuxiCTR: An Open Benchmark for Click-Through Rate Prediction

Arxiv

8+阅读 · 2020年9月12日

Continuously Indexed Domain Adaptation

Arxiv

8+阅读 · 2020年8月30日

Japanese Predicate Conjugation for Neural Machine Translation

Arxiv

3+阅读 · 2018年5月25日

Conditional Image-to-Image Translation

Arxiv

8+阅读 · 2018年5月1日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Synthetic and Natural Noise Both Break Neural Machine Translation

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【数据科学导论书】Introduction to Datascience，253页pdf

【数据科学导论书】Introduction to Datascience，253页pdf

专知会员服务

50+阅读 · 2021年11月15日

【Cell】神经算法推理，Neural algorithmic reasoning

【Cell】神经算法推理，Neural algorithmic reasoning

专知会员服务

29+阅读 · 2021年7月16日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

3+阅读 · 2018年4月10日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Arxiv

0+阅读 · 2021年11月12日

One Step to Efficient Synthetic Data

Arxiv

0+阅读 · 2021年11月12日

Dataset of Philippine Presidents Speeches from 1935 to 2016

Arxiv

0+阅读 · 2021年11月12日

Open Data Fabric: A Decentralized Data Exchange and Transformation Protocol With Complete Reproducibility and Provenance

Open Data Fabric: A Decentralized Data Exchange and Transformation Protocol With Complete Reproducibility and Provenance

Arxiv

2+阅读 · 2021年11月11日

FuxiCTR: An Open Benchmark for Click-Through Rate Prediction

Arxiv

8+阅读 · 2020年9月12日

Continuously Indexed Domain Adaptation

Arxiv

8+阅读 · 2020年8月30日

Japanese Predicate Conjugation for Neural Machine Translation

Arxiv

3+阅读 · 2018年5月25日

Conditional Image-to-Image Translation

Arxiv

8+阅读 · 2018年5月1日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Synthetic and Natural Noise Both Break Neural Machine Translation

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员