合成数据 -- -- 土拨鼠匿名日 (Synthetic Data -- Anonymisation Groundhog Day) - 专知论文

会员服务 ·

0

生成模型 · INFORMS · state-of-the-art · contrastive · MoDELS ·

2021 年 6 月 10 日

Synthetic Data -- Anonymisation Groundhog Day

翻译：合成数据 -- -- 土拨鼠匿名日

Theresa Stadler,Bristena Oprisanu,Carmela Troncoso

Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data publishing that addresses the shortcomings of traditional anonymisation techniques. The promise is that synthetic data drawn from generative models preserves the statistical properties of the original dataset but, at the same time, provides perfect protection against privacy attacks. In this work, we present the first quantitative evaluation of the privacy gain of synthetic data publishing and compare it to that of previous anonymisation techniques. Our evaluation of a wide range of state-of-the-art generative models demonstrates that synthetic data either does not prevent inference attacks or does not retain data utility. In other words, we empirically show that synthetic data suffers from the same limitations as traditional anonymisation techniques. Furthermore, we find that, in contrast to traditional anonymisation, the privacy-utility tradeoff of synthetic data publishing is hard to predict. Because it is impossible to predict what signals a synthetic dataset will preserve and what information will be lost, synthetic data leads to a highly variable privacy gain and unpredictable utility loss. In summary, we find that synthetic data is far from the holy grail of privacy-preserving data publishing.

翻译：合成数据被公诸于世,作为解决传统匿名技术缺陷的隐私保护数据出版的银球解决方案,宣传了合成数据,解决传统匿名技术缺陷的隐私保护数据; 允诺来自基因模型的合成数据保留了原始数据集的统计特性,但同时也提供了完美的隐私保护; 在这项工作中,我们首次对合成数据出版的隐私权收益进行了定量评估,并将其与先前的匿名技术进行比较; 我们对各种最新基因化模型的评估表明,合成数据既不能防止推断攻击,也不能保留数据效用; 换句话说,我们从经验上表明,合成数据受到与传统匿名技术相同的限制; 此外,我们发现,与传统的匿名化相比,合成数据出版的隐私利用权交易很难预测。合成数据无法预测什么是合成数据保存的信号,什么是丢失的信息,因此,合成数据导致高度变异的隐私获取和无法预测的效用损失。总之,我们发现合成数据远离保密数据出版的神圣弱点。

0

相关内容

生成模型

在机器学习中，生成模型可以用来直接对数据建模（例如根据某个变量的概率密度函数进行数据采样），也可以用来建立变量间的条件概率分布。条件概率分布可以由生成模型根据贝叶斯定理形成。

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

生成式对抗网络GAN在计算机视觉中的应用概述，GANs in computer vision: Introduction to generative learning（part1）

生成式对抗网络GAN在计算机视觉中的应用概述，GANs in computer vision: Introduction to generative learning（part1）

专知会员服务

63+阅读 · 2020年4月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

29+阅读 · 2019年11月15日

生成式对抗网络GAN异常检测

生成式对抗网络GAN异常检测

专知会员服务

118+阅读 · 2019年10月13日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

TensorFlow 2.0 Datasets 数据集载入

TensorFlow 2.0 Datasets 数据集载入

TensorFlow

6+阅读 · 2020年1月31日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Reducing Unintended Bias of ML Models on Tabular and Textual Data

Reducing Unintended Bias of ML Models on Tabular and Textual Data

Arxiv

0+阅读 · 2021年8月5日

Training independent subnetworks for robust prediction

Arxiv

0+阅读 · 2021年8月4日

QuantileRK: Solving Large-Scale Linear Systems with Corrupted, Noisy Data

Arxiv

0+阅读 · 2021年8月4日

Privacy-Preserving Synthetic Location Data in the Real World

Privacy-Preserving Synthetic Location Data in the Real World

Arxiv

0+阅读 · 2021年8月4日

Using Interaction Data to Predict Engagement with Interactive Media

Arxiv

0+阅读 · 2021年8月4日

STAN: Synthetic Network Traffic Generation with Generative Neural Models

Arxiv

0+阅读 · 2021年8月3日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Arxiv

5+阅读 · 2018年11月8日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

生成式对抗网络GAN在计算机视觉中的应用概述，GANs in computer vision: Introduction to generative learning（part1）

生成式对抗网络GAN在计算机视觉中的应用概述，GANs in computer vision: Introduction to generative learning（part1）

专知会员服务

63+阅读 · 2020年4月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

29+阅读 · 2019年11月15日

生成式对抗网络GAN异常检测

生成式对抗网络GAN异常检测

专知会员服务

118+阅读 · 2019年10月13日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

【博士论文】快速高效的归一化流及其在图像生成模型中的应用

仿生机器人技术的军事应用

相关资讯

TensorFlow 2.0 Datasets 数据集载入

TensorFlow 2.0 Datasets 数据集载入

TensorFlow

6+阅读 · 2020年1月31日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Reducing Unintended Bias of ML Models on Tabular and Textual Data

Reducing Unintended Bias of ML Models on Tabular and Textual Data

Arxiv

0+阅读 · 2021年8月5日

Training independent subnetworks for robust prediction

Arxiv

0+阅读 · 2021年8月4日

QuantileRK: Solving Large-Scale Linear Systems with Corrupted, Noisy Data

Arxiv

0+阅读 · 2021年8月4日

Privacy-Preserving Synthetic Location Data in the Real World

Privacy-Preserving Synthetic Location Data in the Real World

Arxiv

0+阅读 · 2021年8月4日

Using Interaction Data to Predict Engagement with Interactive Media

Arxiv

0+阅读 · 2021年8月4日

STAN: Synthetic Network Traffic Generation with Generative Neural Models

Arxiv

0+阅读 · 2021年8月3日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Arxiv

5+阅读 · 2018年11月8日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

微信扫码咨询专知VIP会员