测试学习算法的分布分布假设 (Testing distributional assumptions of learning algorithms) - 专知论文

会员服务 ·

0

学成 · 高斯分布 · UniFormer · Pair · contrastive ·

2022 年 4 月 14 日

Testing distributional assumptions of learning algorithms

翻译：测试学习算法的分布分布假设

Ronitt Rubinfeld,Arsen Vasilyan

There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.

翻译：有许多重要的高维函数类, 当对示例分布的强烈假设可以做出时, 具有快速的不可知性学习算法。但是, 人们怎么能足够自信地相信数据确实满足了分布性假设, 这样人们就可以信任该不可知性学习算法的输出质量? 我们提出了一个模型, 用来系统研究测试- lear 配对的设计 $ (\ mathcal{A},\ mathcal{T} $ 。这样, 当数据分布的示例的分布通过测试者 $\ mathcal{T} 时, 人们就可以安全地信任数据上的不可知性学习者 $\ mathcal{A} 的输出结果。为了展示模型的能量, 我们把它应用到在标准高尔氏分布下的经典学习半空空间的经典问题, 并且用一个测试- 测试- 双级双级的混合运行时间 {O} (1/\\ epclonal- 4} 这样的数据流流流。这个质量运算中, 也显示已知的正常的磁性分析。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相场方程的弱超内罚间断Galerkin方法及其自适应算法

国家自然科学基金

1+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多性能退化参数的装备关键系统实时可靠性评估与预测方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

非对称矩阵优化问题的灵敏度分析、算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

HER2/uPAR通路调控乳腺肿瘤休眠和细胞周期的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

蜡样芽孢杆菌Bacillus cereus 905锰超氧化物歧化酶（MnSOD）基因表达调控途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

牦牛乳腺泌乳启动期基因表达谱及初乳期优质基因筛选

国家自然科学基金

0+阅读 · 2011年12月31日

电控旋翼自适应控制方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

Arxiv

0+阅读 · 2022年6月6日

Optimal Stopping Theory for a Distributionally Robust Seller

Optimal Stopping Theory for a Distributionally Robust Seller

Arxiv

0+阅读 · 2022年6月6日

Single- and Multi-Objective Evolutionary Algorithms for the Knapsack Problem with Dynamically Changing Constraints

Arxiv

0+阅读 · 2022年6月5日

Debiased Machine Learning without Sample-Splitting for Stable Estimators

Arxiv

0+阅读 · 2022年6月3日

Approximate confidence distribution computing

Arxiv

0+阅读 · 2022年6月3日

On the Benefits of Large Learning Rates for Kernel Methods

Arxiv

0+阅读 · 2022年6月3日

Linear Algorithms for Nonparametric Multiclass Probability Estimation

Arxiv

0+阅读 · 2022年6月3日

Sequential Permutation Testing of Random Forest Variable Importance Measures

Arxiv

0+阅读 · 2022年6月2日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

面向向量的机器学习系统：跨栈方法

【ICCV2025】SO(3) 上连续非保守动力系统的预测

【ICCV2025】Lay2Story：扩展扩散 Transformer 以实现可切换布局的故事生成

人工智能治理全景综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning

Arxiv

0+阅读 · 2022年6月6日

Optimal Stopping Theory for a Distributionally Robust Seller

Optimal Stopping Theory for a Distributionally Robust Seller

Arxiv

0+阅读 · 2022年6月6日

Single- and Multi-Objective Evolutionary Algorithms for the Knapsack Problem with Dynamically Changing Constraints

Arxiv

0+阅读 · 2022年6月5日

Debiased Machine Learning without Sample-Splitting for Stable Estimators

Arxiv

0+阅读 · 2022年6月3日

Approximate confidence distribution computing

Arxiv

0+阅读 · 2022年6月3日

On the Benefits of Large Learning Rates for Kernel Methods

Arxiv

0+阅读 · 2022年6月3日

Linear Algorithms for Nonparametric Multiclass Probability Estimation

Arxiv

0+阅读 · 2022年6月3日

Sequential Permutation Testing of Random Forest Variable Importance Measures

Arxiv

0+阅读 · 2022年6月2日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

相关基金

相场方程的弱超内罚间断Galerkin方法及其自适应算法

国家自然科学基金

1+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多性能退化参数的装备关键系统实时可靠性评估与预测方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

非对称矩阵优化问题的灵敏度分析、算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

HER2/uPAR通路调控乳腺肿瘤休眠和细胞周期的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

蜡样芽孢杆菌Bacillus cereus 905锰超氧化物歧化酶（MnSOD）基因表达调控途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

牦牛乳腺泌乳启动期基因表达谱及初乳期优质基因筛选

国家自然科学基金

0+阅读 · 2011年12月31日

电控旋翼自适应控制方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员