超越隐私：导航合成数据的机遇与挑战 (Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data) - 专知论文

会员服务 ·

0

合成数据 · 合成 · 模型生成 · 机器学习 · 数据增强 ·

2023 年 4 月 7 日

Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data

翻译：超越隐私：导航合成数据的机遇与挑战

Boris van Breugel,Mihaela van der Schaar

Generating synthetic data through generative models is gaining interest in the ML community and beyond. In the past, synthetic data was often regarded as a means to private data release, but a surge of recent papers explore how its potential reaches much further than this -- from creating more fair data to data augmentation, and from simulation to text generated by ChatGPT. In this perspective we explore whether, and how, synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. Just as importantly, we discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data -- the most important of which is quantifying how much we can trust any finding or prediction drawn from synthetic data.

翻译：通过生成模型生成合成数据在机器学习社区和其他领域中备受关注。过去，合成数据通常被视为保护隐私的手段，但最近的一系列论文探讨了它的潜力远远不止于此——从创建更加公平的数据、数据增强、模拟到 ChatGPT 生成的文本等等。在这篇论文的角度中，我们探讨了合成数据是否及如何成为机器学习世界的主导力量，承诺一个可以根据个人需求定制数据集的未来。同样重要的是，我们讨论了社区需要克服哪些根本性挑战，以便更广泛地应用合成数据——其中最重要的是量化我们可以信任从合成数据中得出的任何发现或预测的程度。

0

相关内容

合成数据

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【干货书】隐私保留机器学习，Privacy-Preserving Machine Learning

【干货书】隐私保留机器学习，Privacy-Preserving Machine Learning

专知会员服务

27+阅读 · 2022年4月6日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

专知会员服务

24+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

两层传感器网络中面向隐私保护的安全K-NN查询机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

2型糖尿病易感基因与膳食碳水化合物质量的交互作用及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向隐私保护的大数据查询处理方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

针对复杂场景服饰图像搜索及推荐关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

面向隐私保护的空间数据处理方法与技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

云计算环境下数据库查询验证及数据隐私保护研究

国家自然科学基金

0+阅读 · 2012年12月31日

高糖/高渗诱导血红素加氧酶在2型糖尿病大血管病变中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

社会网络中个性化隐私保护研究

国家自然科学基金

2+阅读 · 2011年12月31日

面向个性化搜索中的隐私保护关键技术研究

国家自然科学基金

2+阅读 · 2009年12月31日

An Empirical Comparison of LM-based Question and Answer Generation Methods

Arxiv

0+阅读 · 2023年5月26日

Incentive Mechanism for Uncertain Tasks under Differential Privacy

Arxiv

0+阅读 · 2023年5月26日

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Arxiv

0+阅读 · 2023年5月24日

Knowledge Graphs: Opportunities and Challenges

Arxiv

175+阅读 · 2023年3月24日

Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions

Arxiv

21+阅读 · 2022年9月21日

A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities

Arxiv

52+阅读 · 2022年5月13日

AI in Finance: Challenges, Techniques and Opportunities

Arxiv

46+阅读 · 2021年7月20日

Advances and Challenges in Conversational Recommender Systems: A Survey

Arxiv

14+阅读 · 2021年1月23日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Arxiv

77+阅读 · 2019年10月22日

VIP会员

文章信息

相关主题

相关VIP内容

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【干货书】隐私保留机器学习，Privacy-Preserving Machine Learning

【干货书】隐私保留机器学习，Privacy-Preserving Machine Learning

专知会员服务

27+阅读 · 2022年4月6日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

专知会员服务

24+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同作战规划：来自美海军陆战队的大语言模型（LLM）使用教训

对北约军事总部战略规划制定与实施的研究 | 140页

美联参会指南-联合规划与执行概述及政策框架 | 32页

俄罗斯军事规划差异性凸显其思维的重要性 | 2025最新文献

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

相关论文

An Empirical Comparison of LM-based Question and Answer Generation Methods

Arxiv

0+阅读 · 2023年5月26日

Incentive Mechanism for Uncertain Tasks under Differential Privacy

Arxiv

0+阅读 · 2023年5月26日

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Arxiv

0+阅读 · 2023年5月24日

Knowledge Graphs: Opportunities and Challenges

Arxiv

175+阅读 · 2023年3月24日

Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions

Arxiv

21+阅读 · 2022年9月21日

A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities

Arxiv

52+阅读 · 2022年5月13日

AI in Finance: Challenges, Techniques and Opportunities

Arxiv

46+阅读 · 2021年7月20日

Advances and Challenges in Conversational Recommender Systems: A Survey

Arxiv

14+阅读 · 2021年1月23日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Arxiv

77+阅读 · 2019年10月22日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

两层传感器网络中面向隐私保护的安全K-NN查询机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

2型糖尿病易感基因与膳食碳水化合物质量的交互作用及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向隐私保护的大数据查询处理方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

针对复杂场景服饰图像搜索及推荐关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

面向隐私保护的空间数据处理方法与技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

云计算环境下数据库查询验证及数据隐私保护研究

国家自然科学基金

0+阅读 · 2012年12月31日

高糖/高渗诱导血红素加氧酶在2型糖尿病大血管病变中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

社会网络中个性化隐私保护研究

国家自然科学基金

2+阅读 · 2011年12月31日

面向个性化搜索中的隐私保护关键技术研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员