提高合成数据效率的一步 (One Step to Efficient Synthetic Data) - 专知论文

会员服务 ·

0

统计量 · 联合分布 · Performer · 估计/估计量 · MoDELS ·

2021 年 11 月 12 日

One Step to Efficient Synthetic Data

翻译：提高合成数据效率的一步

Jordan Awan,Zhanrui Cai

from arxiv, 17 pages, before appendices/references

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform approximate hypothesis tests in the presence of intractable likelihood functions.

翻译：合成数据的一个共同方法是从一个合适的模型中抽样。我们表明,在一般假设下,该方法的结果是抽样使用效率低下的估算器,其共同分布与真实分布不相符。我们为此提出一个普遍适用于参数模型的合成数据生产通用方法,该方法具有暂时有效的简要统计数据,而且易于实施,而且计算效率也很高。我们的方法允许建立部分合成数据集,保存某些摘要统计数据,以及完全合成数据,既能满足差异隐私的有力保障(DP),又能提供同样的同步保证。我们还提供理论和经验证据,证明我们程序分配的合成数据与真实分布一致。除了我们注重合成数据外,我们的程序还可以用来在难以找到的可能性功能的情况下进行大致的假设测试。

0

相关内容

统计量

【哈佛大学干货书】概率导论，589页pdf，Introduction to Probability

【哈佛大学干货书】概率导论，589页pdf，Introduction to Probability

专知会员服务

139+阅读 · 2021年1月24日

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

专知会员服务

36+阅读 · 2021年1月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

4+阅读 · 2018年7月31日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On Utility and Privacy in Synthetic Genomic Data

Arxiv

0+阅读 · 2022年1月18日

Finite sample inference for generic autoregressive models

Arxiv

0+阅读 · 2022年1月18日

On Generalized Random Environment INAR Models of Higher Order: Estimation of Random Environment States

Arxiv

0+阅读 · 2022年1月17日

Theoretical analysis and computation of the sample Frechet mean for sets of large graphs based on spectral information

Arxiv

0+阅读 · 2022年1月15日

Probabilistic Counters for Privacy Preserving Data Aggregation

Probabilistic Counters for Privacy Preserving Data Aggregation

Arxiv

0+阅读 · 2022年1月14日

Learning to Learn Graph Topologies

Arxiv

7+阅读 · 2021年10月19日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Quickshift++: Provably Good Initializations for Sample-Based Mean Shift

Arxiv

4+阅读 · 2018年5月21日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【哈佛大学干货书】概率导论，589页pdf，Introduction to Probability

【哈佛大学干货书】概率导论，589页pdf，Introduction to Probability

专知会员服务

139+阅读 · 2021年1月24日

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

最新《对抗机器学习》报告，EPFL-Volkan教授讲解AML中的优化问题

专知会员服务

36+阅读 · 2021年1月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

不确定环境下无人机三维路径规划研究 | 221页

远征作战军事后勤规划

大语言模型将如何改变军事指挥结构

美陆军能力集成与开发系统（ACIDS）流程指南 | 2025最新122页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

4+阅读 · 2018年7月31日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On Utility and Privacy in Synthetic Genomic Data

Arxiv

0+阅读 · 2022年1月18日

Finite sample inference for generic autoregressive models

Arxiv

0+阅读 · 2022年1月18日

On Generalized Random Environment INAR Models of Higher Order: Estimation of Random Environment States

Arxiv

0+阅读 · 2022年1月17日

Theoretical analysis and computation of the sample Frechet mean for sets of large graphs based on spectral information

Arxiv

0+阅读 · 2022年1月15日

Probabilistic Counters for Privacy Preserving Data Aggregation

Probabilistic Counters for Privacy Preserving Data Aggregation

Arxiv

0+阅读 · 2022年1月14日

Learning to Learn Graph Topologies

Arxiv

7+阅读 · 2021年10月19日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Quickshift++: Provably Good Initializations for Sample-Based Mean Shift

Arxiv

4+阅读 · 2018年5月21日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

微信扫码咨询专知VIP会员