从比重培训抽样中吸取的统计经验 (Statistical Learning from Biased Training Samples) - 专知论文

会员服务 ·

0

有偏 · 统计量 · 学成 · 训练样本 · 经验风险 ·

2021 年 10 月 24 日

Statistical Learning from Biased Training Samples

翻译：从比重培训抽样中吸取的统计经验

Stephan Clémençon,Pierre Laforgue

With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many practical situations, the poor control of the data acquisition processes may naturally jeopardize the outputs of machine learning algorithms, and selection bias issues are now the subject of much attention in the literature. The present article investigates how to extend Empirical Risk Minimization, the principal paradigm in statistical learning, when training observations are generated from biased models, i.e., from distributions that are different from that in the test/prediction stage, and absolutely continuous with respect to the latter. Precisely, we show how to build a "nearly debiased" training statistical population from biased samples and the related biasing functions, following in the footsteps of the approach originally proposed in Vardi (1985). Furthermore, we study from a nonasymptotic perspective the performance of minimizers of an empirical version of the risk computed from the statistical population thus created. Remarkably, the learning rate achieved by this procedure is of the same order as that attained in absence of selection bias. Beyond the theoretical guarantees, we also present experimental results supporting the relevance of the algorithmic approach promoted in this paper.

翻译：随着大数据时代数字化信息的泛滥,大量数据集越来越多地用于学习预测模型,然而,在许多实际情况下,对数据获取过程的控制不力自然会危及机器学习算法的输出,选择偏差问题现在成为文献中许多关注的主题。本条款调查如何扩大经验风险最小化,这是统计学习的主要范例,因为培训观测来自偏向模式,即不同于测试/预测阶段的分布,而后者则绝对持续。准确地说,我们展示了如何在Vardi(1985年)最初提出的方法的足迹中,从偏向抽样中建立“近距离下降”培训统计人口和相关偏差功能。此外,我们从非抽象的角度研究如何最大限度地降低从统计人口中计算出的风险的经验版本的绩效。值得注意的是,这一程序的学习率与选择偏差的顺序相同。除了理论保证外,我们还介绍了实验性结果,支持在本文中推广了算法的相关性。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

已删除

将门创投

4+阅读 · 2019年6月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Statistical Efficiency of Travel Time Prediction

Arxiv

0+阅读 · 2021年12月18日

Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning

Arxiv

0+阅读 · 2021年12月18日

Statistical Estimation and Inference via Local SGD in Federated Learning

Arxiv

0+阅读 · 2021年12月17日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

Dissecting Supervised Constrastive Learning

Arxiv

11+阅读 · 2021年2月17日

Debiased Contrastive Learning

Arxiv

5+阅读 · 2020年10月21日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Learning from Very Few Samples: A Survey

Arxiv

126+阅读 · 2020年9月6日

Deep Robust Clustering by Contrastive Learning

Arxiv

7+阅读 · 2020年8月7日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

已删除

将门创投

4+阅读 · 2019年6月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Statistical Efficiency of Travel Time Prediction

Arxiv

0+阅读 · 2021年12月18日

Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning

Arxiv

0+阅读 · 2021年12月18日

Statistical Estimation and Inference via Local SGD in Federated Learning

Arxiv

0+阅读 · 2021年12月17日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

Dissecting Supervised Constrastive Learning

Arxiv

11+阅读 · 2021年2月17日

Debiased Contrastive Learning

Arxiv

5+阅读 · 2020年10月21日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Learning from Very Few Samples: A Survey

Arxiv

126+阅读 · 2020年9月6日

Deep Robust Clustering by Contrastive Learning

Arxiv

7+阅读 · 2020年8月7日

微信扫码咨询专知VIP会员