组合与合并:最小广场和随机森林 (On Ensembling vs Merging: Least Squares and Random Forests under Covariate Shift) - 专知论文

会员服务 ·

0

随机森林 · 方阵 · 线性的 · 协变量偏移 · MoDELS ·

2021 年 6 月 4 日

On Ensembling vs Merging: Least Squares and Random Forests under Covariate Shift

翻译：组合与合并:最小广场和随机森林

Maya Ramchandran,Rajarshi Mukherjee

from arxiv, 9 pages, 2 figures, 1 table

It has been postulated and observed in practice that for prediction problems in which covariate data can be naturally partitioned into clusters, ensembling algorithms based on suitably aggregating models trained on individual clusters often perform substantially better than methods that ignore the clustering structure in the data. In this paper, we provide theoretical support to these empirical observations by asymptotically analyzing linear least squares and random forest regressions under a linear model. Our main results demonstrate that the benefit of ensembling compared to training a single model on the entire data, often termed 'merging', might depend on the underlying bias and variance interplay of the individual predictors to be aggregated. In particular, under both fixed and high dimensional linear models, we show that merging is asymptotically superior to optimal ensembling techniques for linear least squares regression due to the unbiased nature of least squares prediction. In contrast, for random forest regression under fixed dimensional linear models, our bounds imply a strict benefit of ensembling over merging. Finally, we also present numerical experiments to verify the validity of our asymptotic results across different situations.

翻译：人们假设并在实践中观察到,对于可以自然地将共变数据分成组群的预测问题,基于对单个组群所训练的适当集成模型的混合算法往往比忽略数据组群结构的方法效果要好得多。在本文中,我们通过对线性模型下的线性最小方和随机森林回归进行不折不扣的分析,为这些经验性观测提供理论支持。我们的主要结果显示,与整个数据(通常称为“合并”)的单一模型培训相比,合并的好处可能取决于个别预测器的内在偏差和差异相互作用。特别是,在固定和高维线性线性模型下,我们显示,由于最小方形预测的公正性,合并对于线性最小方形回归的最佳混合技术而言,相互合并在瞬间优于最佳的混合技术。相比之下,对于固定线性线性模型下的随机森林回归,我们的界限意味着对合并的严格好处。最后,我们还进行数字实验,以核实不同情况下我们无干扰结果的有效性。

0

相关内容

随机森林

随机森林指的是利用多棵树对样本进行训练并预测的一种分类器。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

伯克利经典《机器学习数学基础》，47页pdf

专知会员服务

182+阅读 · 2021年1月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Data-Driven Diverse Logistic Regression Ensembles

Arxiv

0+阅读 · 2021年7月28日

Proximal boosting and variants

Arxiv

0+阅读 · 2021年7月27日

Subset selection for linear mixed models

Arxiv

0+阅读 · 2021年7月27日

Entropy Maximization and Meta Classification for Out-Of-Distribution Detection in Semantic Segmentation

Arxiv

0+阅读 · 2021年7月27日

On the Role of Optimization in Double Descent: A Least Squares Study

Arxiv

0+阅读 · 2021年7月27日

Greedy Gradient Ensemble for Robust Visual Question Answering

Arxiv

0+阅读 · 2021年7月27日

Extrapolation Estimation for Nonparametric Regression with Measurement Error

Arxiv

0+阅读 · 2021年7月27日

Resilient Distributed Averaging

Arxiv

0+阅读 · 2021年7月26日

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Arxiv

4+阅读 · 2020年10月8日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

协变量偏移

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

伯克利经典《机器学习数学基础》，47页pdf

专知会员服务

182+阅读 · 2021年1月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Data-Driven Diverse Logistic Regression Ensembles

Arxiv

0+阅读 · 2021年7月28日

Proximal boosting and variants

Arxiv

0+阅读 · 2021年7月27日

Subset selection for linear mixed models

Arxiv

0+阅读 · 2021年7月27日

Entropy Maximization and Meta Classification for Out-Of-Distribution Detection in Semantic Segmentation

Arxiv

0+阅读 · 2021年7月27日

On the Role of Optimization in Double Descent: A Least Squares Study

Arxiv

0+阅读 · 2021年7月27日

Greedy Gradient Ensemble for Robust Visual Question Answering

Arxiv

0+阅读 · 2021年7月27日

Extrapolation Estimation for Nonparametric Regression with Measurement Error

Arxiv

0+阅读 · 2021年7月27日

Resilient Distributed Averaging

Arxiv

0+阅读 · 2021年7月26日

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Arxiv

4+阅读 · 2020年10月8日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员