通过合成微数据生成实现私人制表调查数据产品 (Private Tabular Survey Data Products through Synthetic Microdata Generation) - 专知论文

会员服务 ·

0

Weight · 估计/估计量 · 似然 · 样本 · 单元 ·

2021 年 9 月 2 日

Private Tabular Survey Data Products through Synthetic Microdata Generation

翻译：通过合成微数据生成实现私人制表调查数据产品

Jingchen Hu,Terrance D. Savitsky,Matthew R. Williams

We propose two synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a pseudo posterior mechanism that downweights by-record likelihood contributions with weights $\in [0,1]$ based on their identification disclosure risks to producing tabular products for survey data. Our method applied to an observed survey database achieves an asymptotic global probabilistic differential privacy guarantee. Our two approaches synthesize the observed sample distribution of the outcome and survey weights, jointly, such that both quantities together possess a privacy guarantee. The privacy-protected outcome and survey weights are used to construct tabular cell estimates (where the cell inclusion indicators are treated as known and public) and associated standard errors to correct for survey sampling bias. Through a real data application to the Survey of Doctorate Recipients public use file and simulation studies motivated by the application, we demonstrate that our two microdata synthesis approaches to construct tabular products provide superior utility preservation as compared to the additive-noise approach of the Laplace Mechanism. Moreover, our approaches allow the release of microdata to the public, enabling additional analyses at no extra privacy cost.

翻译：我们建议采用两种综合微观数据方法,以产生供公开发行的私人表格调查数据产品。我们建议采用两种合成微观数据方法。我们调整了一种假后代机制,即根据识别披露风险,降低按重量记录贡献的概率[0,1]美元,以生成用于调查数据的表格产品。我们用于观察的调查数据库的方法实现了一种无症状的全球概率差异隐私保障。我们的两个方法综合了观察到的结果和调查重量的样本分布,使所观察到的样本数量都具有隐私保障。隐私保护的结果和调查权重被用来编制表格细胞估计数(在细胞包含指标被视为已知和公开的情况下)和相关的标准错误,以纠正调查抽样偏差。我们通过对博士接收者调查使用公众档案和模拟研究的实际数据应用,证明我们两个用于构建表格产品的微观数据合成方法与拉比机制的添加营养法相比,提供了更好的效用保护。此外,我们的方法允许向公众发布微观数据,从而能够在不增加隐私成本的情况下进行额外分析。

0

相关内容

Weight

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

【斯坦福CS224W】知识图谱推理，84页ppt

【斯坦福CS224W】知识图谱推理，84页ppt

专知会员服务

121+阅读 · 2021年2月19日

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知会员服务

89+阅读 · 2020年2月28日

【新书】贝叶斯网络进展与新应用，附全书下载

【新书】贝叶斯网络进展与新应用，附全书下载

专知会员服务

122+阅读 · 2019年12月9日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知

20+阅读 · 2020年2月28日

【专题】美国隐私立法进展的总体分析

【专题】美国隐私立法进展的总体分析

蚂蚁金服评论

11+阅读 · 2019年4月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

7+阅读 · 2018年12月12日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

A Personalized Diagnostic Generation Framework Based on Multi-source Heterogeneous Data

Arxiv

0+阅读 · 2021年10月26日

AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data

Arxiv

0+阅读 · 2021年10月26日

Bayesian Estimation and Comparison of Conditional Moment Models

Arxiv

0+阅读 · 2021年10月26日

Negotiating Networks in Oligopoly Markets for Price-Sensitive Products

Arxiv

0+阅读 · 2021年10月25日

An Uncertainty Principle is a Price of Privacy-Preserving Microdata

Arxiv

0+阅读 · 2021年10月25日

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Arxiv

0+阅读 · 2021年10月25日

Privacy in Open Search: A Review of Challenges and Solutions

Arxiv

0+阅读 · 2021年10月24日

Partially Intervenable Causal Models

Arxiv

0+阅读 · 2021年10月24日

Erlang mixture modeling for Poisson process intensities

Arxiv

0+阅读 · 2021年10月24日

Deep Neural Networks and Tabular Data: A Survey

Arxiv

9+阅读 · 2021年10月5日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

【斯坦福CS224W】知识图谱推理，84页ppt

【斯坦福CS224W】知识图谱推理，84页ppt

专知会员服务

121+阅读 · 2021年2月19日

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知会员服务

89+阅读 · 2020年2月28日

【新书】贝叶斯网络进展与新应用，附全书下载

【新书】贝叶斯网络进展与新应用，附全书下载

专知会员服务

122+阅读 · 2019年12月9日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知

20+阅读 · 2020年2月28日

【专题】美国隐私立法进展的总体分析

【专题】美国隐私立法进展的总体分析

蚂蚁金服评论

11+阅读 · 2019年4月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

7+阅读 · 2018年12月12日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

A Personalized Diagnostic Generation Framework Based on Multi-source Heterogeneous Data

Arxiv

0+阅读 · 2021年10月26日

AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data

Arxiv

0+阅读 · 2021年10月26日

Bayesian Estimation and Comparison of Conditional Moment Models

Arxiv

0+阅读 · 2021年10月26日

Negotiating Networks in Oligopoly Markets for Price-Sensitive Products

Arxiv

0+阅读 · 2021年10月25日

An Uncertainty Principle is a Price of Privacy-Preserving Microdata

Arxiv

0+阅读 · 2021年10月25日

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Arxiv

0+阅读 · 2021年10月25日

Privacy in Open Search: A Review of Challenges and Solutions

Arxiv

0+阅读 · 2021年10月24日

Partially Intervenable Causal Models

Arxiv

0+阅读 · 2021年10月24日

Erlang mixture modeling for Poisson process intensities

Arxiv

0+阅读 · 2021年10月24日

Deep Neural Networks and Tabular Data: A Survey

Arxiv

9+阅读 · 2021年10月5日

微信扫码咨询专知VIP会员