带有交叉估价特征选择的渐进推动树中的特性重要性 (Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection) - 专知论文

会员服务 ·

0

有偏 · Boosting（一种模型训练加速方式） · 特征选择 · 基 · Extensibility ·

2021 年 9 月 12 日

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

翻译：带有交叉估价特征选择的渐进推动树中的特性重要性

Afek Ilay Adler,Amichai Painsky

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state of the art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We show that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.

翻译：在表格数据上,渐进推力机(GBM)是算法的一部分,它产生许多预测任务的最新结果。尽管它受到欢迎,但GBM框架在基础学习者中存在着根本性缺陷。具体地说,大多数执行过程都使用典型偏向于绝对变量且具有巨大基本特征的决策树。多年来,这种偏差的影响得到了广泛的研究,主要是预测性能方面的研究。在这项工作中,我们扩大了有偏见的基础学习者对GBM特征重要性(FI)措施的影响的范围并进行了研究。我们表明,尽管这些实施过程显示出高度竞争性的预测性能,但令人惊讶的是,在FI中仍然存在着偏见。我们利用交叉有效的(CV)不带偏见的基础学习者,用相对较低的计算成本来修正这一缺陷。我们在各种合成和现实世界设置中展示了所建议的框架,表明所有GBM FI措施都有很大改进,同时保持了相对相同的预测准确度。

0

相关内容

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

专知会员服务

52+阅读 · 2020年6月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

RF、GBDT、XGBoost面试级整理

RF、GBDT、XGBoost面试级整理

数据挖掘入门与实战

17+阅读 · 2018年3月21日

RF(随机森林)、GBDT、XGBoost面试级整理

RF(随机森林)、GBDT、XGBoost面试级整理

数据挖掘入门与实战

7+阅读 · 2018年2月6日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

xgboost特征选择

xgboost特征选择

数据挖掘入门与实战

39+阅读 · 2017年10月5日

Finding the KT partition of a weighted graph in near-linear time

Arxiv

0+阅读 · 2021年11月2日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

Arxiv

13+阅读 · 2019年11月1日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

Arxiv

5+阅读 · 2019年1月24日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Consensus Based Medical Image Segmentation Using Semi-Supervised Learning And Graph Cuts

Arxiv

11+阅读 · 2018年5月21日

Scalable attribute-aware network embedding with localily

Arxiv

3+阅读 · 2018年4月17日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

相关VIP内容

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

专知会员服务

52+阅读 · 2020年6月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

【自监督学习深度神经网络视觉特征学习综述论文】Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

专知会员服务

87+阅读 · 2020年3月1日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

美国武装部队面临战车可维护性问题

《5G测试平台：探索5G在军事场景中的赋能平台》

海军无人系统：海上作战的演进而非革命

《未来无人海军系统：海上无人机效能增强与作战升级概览》2025最新93页

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

RF、GBDT、XGBoost面试级整理

RF、GBDT、XGBoost面试级整理

数据挖掘入门与实战

17+阅读 · 2018年3月21日

RF(随机森林)、GBDT、XGBoost面试级整理

RF(随机森林)、GBDT、XGBoost面试级整理

数据挖掘入门与实战

7+阅读 · 2018年2月6日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

xgboost特征选择

xgboost特征选择

数据挖掘入门与实战

39+阅读 · 2017年10月5日

相关论文

Finding the KT partition of a weighted graph in near-linear time

Arxiv

0+阅读 · 2021年11月2日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

Arxiv

13+阅读 · 2019年11月1日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

Arxiv

5+阅读 · 2019年1月24日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Consensus Based Medical Image Segmentation Using Semi-Supervised Learning And Graph Cuts

Arxiv

11+阅读 · 2018年5月21日

Scalable attribute-aware network embedding with localily

Arxiv

3+阅读 · 2018年4月17日

微信扫码咨询专知VIP会员