提高合成数据效率的一步 (One Step to Efficient Synthetic Data) - 专知论文

会员服务 ·

0

统计量 · motivation · 联合分布 · Performer · 估计/估计量 ·

2022 年 8 月 10 日

One Step to Efficient Synthetic Data

翻译：提高合成数据效率的一步

Jordan Awan,Zhanrui Cai

from arxiv, 16 pages before appendices/references

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform approximate hypothesis tests in the presence of intractable likelihood functions.

翻译：合成数据的一个共同方法是从一个合适的模型中抽样。我们表明,在一般假设下,该方法的结果是抽样使用效率低下的估算器,其共同分布与真实分布不相符。我们为此提出一个普遍适用于参数模型的合成数据生产通用方法,该方法具有暂时有效的简要统计数据,而且易于实施,而且计算效率也很高。我们的方法允许建立部分合成数据集,保存某些摘要统计数据,以及完全合成数据,既能满足差异隐私的有力保障(DP),又能提供同样的同步保证。我们还提供理论和经验证据,证明我们程序分配的合成数据与真实分布一致。除了我们注重合成数据外,我们的程序还可以用来在难以找到的可能性功能的情况下进行大致的假设测试。

0

相关内容

统计量

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

一类微分半变分不等式问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Morrey空间的函数空间实变理论及其应用

国家自然科学基金

0+阅读 · 2014年12月31日

海水入侵问题的并行子空间校正算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

沙尘和黑碳气溶胶的非球形单颗粒形貌模拟及其光学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

信息部分丢失下随机Markov跳参数系统的非线性滤波

国家自然科学基金

0+阅读 · 2009年12月31日

基于多源观测数据的三维云融合分析算法研究

国家自然科学基金

2+阅读 · 2009年12月31日

A new efficient explicit Deferred Correction framework: analysis and applications to hyperbolic PDEs and adaptivity

Arxiv

0+阅读 · 2022年10月6日

On the detrimental effect of invariances in the likelihood for variational inference

Arxiv

0+阅读 · 2022年10月6日

Hypernetwork approach to Bayesian MAML

Arxiv

0+阅读 · 2022年10月6日

Efficient Estimation Under Data Fusion

Arxiv

0+阅读 · 2022年10月5日

The Variational Method of Moments

Arxiv

0+阅读 · 2022年10月4日

Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

Arxiv

0+阅读 · 2022年10月3日

Generating Synthetic Data with The Nearest Neighbors Algorithm

Arxiv

0+阅读 · 2022年10月3日

Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

Arxiv

0+阅读 · 2022年10月3日

Inferring Manifolds From Noisy Data Using Gaussian Processes

Arxiv

0+阅读 · 2022年10月2日

A Simple Approach to Automated Spectral Clustering

Arxiv

0+阅读 · 2022年10月1日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

发射器定位中的传感器路径规划研究 | 235页

战略无人机 | 2025最新80页

蜂窝通信是否是无人机与无人地面战车主宰战场的关键？

无人机对机动战的影响 | 2025最新文献

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A new efficient explicit Deferred Correction framework: analysis and applications to hyperbolic PDEs and adaptivity

Arxiv

0+阅读 · 2022年10月6日

On the detrimental effect of invariances in the likelihood for variational inference

Arxiv

0+阅读 · 2022年10月6日

Hypernetwork approach to Bayesian MAML

Arxiv

0+阅读 · 2022年10月6日

Efficient Estimation Under Data Fusion

Arxiv

0+阅读 · 2022年10月5日

The Variational Method of Moments

Arxiv

0+阅读 · 2022年10月4日

Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

Arxiv

0+阅读 · 2022年10月3日

Generating Synthetic Data with The Nearest Neighbors Algorithm

Arxiv

0+阅读 · 2022年10月3日

Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

Arxiv

0+阅读 · 2022年10月3日

Inferring Manifolds From Noisy Data Using Gaussian Processes

Arxiv

0+阅读 · 2022年10月2日

A Simple Approach to Automated Spectral Clustering

Arxiv

0+阅读 · 2022年10月1日

相关基金

一类微分半变分不等式问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Morrey空间的函数空间实变理论及其应用

国家自然科学基金

0+阅读 · 2014年12月31日

海水入侵问题的并行子空间校正算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

沙尘和黑碳气溶胶的非球形单颗粒形貌模拟及其光学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

信息部分丢失下随机Markov跳参数系统的非线性滤波

国家自然科学基金

0+阅读 · 2009年12月31日

基于多源观测数据的三维云融合分析算法研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员