多任务学习和边际查询的私人合成数据 (Private Synthetic Data for Multitask Learning and Marginal Queries) - 专知论文

会员服务 ·

0

边缘化 · Learning · 线性的 · 模型评估 · 类标记 ·

2022 年 9 月 15 日

Private Synthetic Data for Multitask Learning and Marginal Queries

翻译：多任务学习和边际查询的私人合成数据

Giuseppe Vietri,Cedric Archambeau,Sergul Aydore,William Brown,Michael Kearns,Aaron Roth,Ankit Siva,Shuai Tang,Zhiwei Steven Wu

from arxiv, The short version of this paper appears in the proceedings of NeurIPS-22

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.

翻译：我们为制作合成数据同时提供一种对多种任务有用的有差别的私人算法:边际查询和多任务机器学习(ML)。我们的算法中的一个关键创新是能够直接处理数字特征,而与此不同的是,以前的一些相关方法要求首先将数字特征转换成{高基数}绝对特征,通过 {a binning 战略} 。为了提高准确性,需要更高程度的硬质颗粒度,但这种负面的可缩放性。消除对硬质的需求,使我们能够生成合成数据,保存大量统计查询,如数字特征边际和等级有条件线性临界值查询。保留后一种功能意味着,在实际和合成数据中,每个等级标签的分数大致相同。这是在多任务设置中训练线性分类员所需要的属性。我们的算法还使我们能够为混合的边际查询提供高质量的合成数据,既包括绝对性和数字性特征。我们的方法比最佳可比技术持续速度为2-5x,并在混合类型数据集的边际查询和线性预测任务中提供显著的精确性改进。

0

相关内容

边缘化

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

公路桥梁非平稳随机激励车桥耦合振动响应研究

国家自然科学基金

0+阅读 · 2014年12月31日

茂型金属羰基化合物催化Friedel-Crafts反应研究

国家自然科学基金

0+阅读 · 2013年12月31日

Par-4在hTERT非端粒酶活性依赖抗凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

PTHLH在小鼠着床前胚胎发育中作用机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微分算子谱的离散性研究

国家自然科学基金

0+阅读 · 2011年12月31日

离子液体功能化手性Bronsted酸催化剂创制及其在催化反应中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

αctinin 4介导NHERF1调节细胞微丝骨架及其对肿瘤细胞黏附与迁移的影响

国家自然科学基金

0+阅读 · 2011年12月31日

考虑制造误差的滑动轴承转子系统非线性动力学分析

国家自然科学基金

0+阅读 · 2008年12月31日

Private Online Prediction from Experts: Separations and Faster Rates

Arxiv

0+阅读 · 2022年10月24日

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Arxiv

0+阅读 · 2022年10月24日

Multiplicity-adjusted bootstrap tilting lower confidence bounds for conditional prediction performance measures

Arxiv

0+阅读 · 2022年10月24日

Differentially Private Data Generation Needs Better Features

Arxiv

0+阅读 · 2022年10月24日

Efficient learning of nonlinear prediction models with time-series privileged information

Arxiv

0+阅读 · 2022年10月21日

A GA-like Dynamic Probability Method With Mutual Information for Feature Selection

Arxiv

0+阅读 · 2022年10月21日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Arxiv

12+阅读 · 2019年9月26日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Private Online Prediction from Experts: Separations and Faster Rates

Arxiv

0+阅读 · 2022年10月24日

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Arxiv

0+阅读 · 2022年10月24日

Multiplicity-adjusted bootstrap tilting lower confidence bounds for conditional prediction performance measures

Arxiv

0+阅读 · 2022年10月24日

Differentially Private Data Generation Needs Better Features

Arxiv

0+阅读 · 2022年10月24日

Efficient learning of nonlinear prediction models with time-series privileged information

Arxiv

0+阅读 · 2022年10月21日

A GA-like Dynamic Probability Method With Mutual Information for Feature Selection

Arxiv

0+阅读 · 2022年10月21日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Arxiv

12+阅读 · 2019年9月26日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

公路桥梁非平稳随机激励车桥耦合振动响应研究

国家自然科学基金

0+阅读 · 2014年12月31日

茂型金属羰基化合物催化Friedel-Crafts反应研究

国家自然科学基金

0+阅读 · 2013年12月31日

Par-4在hTERT非端粒酶活性依赖抗凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

PTHLH在小鼠着床前胚胎发育中作用机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微分算子谱的离散性研究

国家自然科学基金

0+阅读 · 2011年12月31日

离子液体功能化手性Bronsted酸催化剂创制及其在催化反应中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

αctinin 4介导NHERF1调节细胞微丝骨架及其对肿瘤细胞黏附与迁移的影响

国家自然科学基金

0+阅读 · 2011年12月31日

考虑制造误差的滑动轴承转子系统非线性动力学分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员