适应性数据分析的子抽样补充 (Subsampling Suffices for Adaptive Data Analysis) - 专知论文

会员服务 ·

0

子采样 · 统计量 · Analysis · SimPLe · 数据分析 ·

2023 年 2 月 17 日

Subsampling Suffices for Adaptive Data Analysis

翻译：适应性数据分析的子抽样补充

from arxiv, To appear at STOC 2023

Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of \emph{adaptive data analysis} was formalized in the seminal works of Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014). We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively: The only requirements are that each query takes as input a random subsample and outputs few bits. This result shows that the noise inherent in subsampling is sufficient to guarantee that query responses generalize. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work. In addition to its simplicity, we demonstrate the utility of this framework by designing mechanisms for two foundational tasks, statistical queries and median finding. In particular, our mechanism for answering the broadly applicable class of statistical queries is both extremely simple and state of the art in many parameter regimes.

翻译：多数古典技术假定,数据集独立于分析员的查询,并在通用环境中细分,将数据集重新用于多种适应性选择的查询。Dwork et al.(STOC,2015年)和Hardt and Ullman(FOCS,2014年)的开创性著作中正式确定了对数据集进行分析的问题。我们确定了一套非常简单的假设,根据这些假设,即使选择适应性,查询也将继续具有代表性:唯一的要求是,每个查询作为输入随机子抽样和产出,只有很少的几位。这一结果表明,子抽样中固有的噪音足以保证查询答复的笼统性。这一子抽样框架的简单化使得它能够模拟先前工作没有涵盖的各种现实世界情景。除了其简单性外,我们还通过设计两个基本任务、统计查询和中位发现机制来证明这一框架的实用性。特别是,我们用来回答广泛应用的统计查询的分类机制非常简单,也是许多参数的参数。

0

相关内容

子采样

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

抗MRSA活性rhodomyrtosone B类似物的合成和构效关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

空间分数阶Schr？dinger方程的时间分裂谱方法

国家自然科学基金

0+阅读 · 2014年12月31日

杨梅雌雄性别决定的遗传和基因组学基础

国家自然科学基金

0+阅读 · 2014年12月31日

土壤团聚体对致病性尖孢镰刀菌的作用机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向认知Ad hoc网络多信道路由的信任管理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Massive MIMO系统关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国小麦地方品种抗白粉病基因Pm5e精细遗传图谱绘制

国家自然科学基金

0+阅读 · 2012年12月31日

服务驱动的空间数据起源建模、追踪和共享

国家自然科学基金

0+阅读 · 2012年12月31日

病理性近视易感基因研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding

Arxiv

0+阅读 · 2023年4月10日

Contraction-Guided Adaptive Partitioning for Reachability Analysis of Neural Network Controlled Systems

Arxiv

0+阅读 · 2023年4月7日

GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs

Arxiv

0+阅读 · 2023年4月7日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Arxiv

0+阅读 · 2023年4月5日

Adaptive Data Augmentation for Contrastive Learning

Arxiv

0+阅读 · 2023年4月5日

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Arxiv

0+阅读 · 2023年4月4日

Semiparametric efficient estimation of genetic relatedness with double machine learning

Arxiv

0+阅读 · 2023年4月4日

Statistical Analysis of Chen Distribution Under Improved Adaptive Type-II Progressive Censoring

Arxiv

0+阅读 · 2023年4月1日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding

Arxiv

0+阅读 · 2023年4月10日

Contraction-Guided Adaptive Partitioning for Reachability Analysis of Neural Network Controlled Systems

Arxiv

0+阅读 · 2023年4月7日

GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs

Arxiv

0+阅读 · 2023年4月7日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Arxiv

0+阅读 · 2023年4月5日

Adaptive Data Augmentation for Contrastive Learning

Arxiv

0+阅读 · 2023年4月5日

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Arxiv

0+阅读 · 2023年4月4日

Semiparametric efficient estimation of genetic relatedness with double machine learning

Arxiv

0+阅读 · 2023年4月4日

Statistical Analysis of Chen Distribution Under Improved Adaptive Type-II Progressive Censoring

Arxiv

0+阅读 · 2023年4月1日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

相关基金

抗MRSA活性rhodomyrtosone B类似物的合成和构效关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

空间分数阶Schr？dinger方程的时间分裂谱方法

国家自然科学基金

0+阅读 · 2014年12月31日

杨梅雌雄性别决定的遗传和基因组学基础

国家自然科学基金

0+阅读 · 2014年12月31日

土壤团聚体对致病性尖孢镰刀菌的作用机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向认知Ad hoc网络多信道路由的信任管理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Massive MIMO系统关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国小麦地方品种抗白粉病基因Pm5e精细遗传图谱绘制

国家自然科学基金

0+阅读 · 2012年12月31日

服务驱动的空间数据起源建模、追踪和共享

国家自然科学基金

0+阅读 · 2012年12月31日

病理性近视易感基因研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员