树形集合节奏的强力假说测试 (A Robust Hypothesis Test for Tree Ensemble Pruning) - 专知论文

会员服务 ·

0

集成修剪 · 稳健性 · 剪枝 · Boosting（一种模型训练加速方式） · 集成 ·

2023 年 1 月 25 日

A Robust Hypothesis Test for Tree Ensemble Pruning

翻译：树形集合节奏的强力假说测试

Daniel de Marchi,Matthew Welch,Michael Kosorok

Gradient boosted decision trees are some of the most popular algorithms in applied machine learning. They are a flexible and powerful tool that can robustly fit to any tabular dataset in a scalable and computationally efficient way. One of the most critical parameters to tune when fitting these models are the various penalty terms used to distinguish signal from noise in the current model. These penalties are effective in practice, but are lacking in robust theoretical justifications. In this paper we develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles and demonstrate that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss. Additionally, this method provides a theoretically well-justified stopping condition for the tree growing algorithm. We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.

翻译：梯度推动决策树是应用机器学习中最受欢迎的算法之一。它们是一个灵活而有力的工具,能够以可缩放和计算效率高的方式牢固地适应任何表格数据集。匹配这些模型时最关键的参数之一是用于区分信号和当前模型中噪音的各种惩罚术语。这些惩罚在实际中是有效的,但却缺乏有力的理论理由。在本文中,我们开发并提出了一个新的理论上合理的假设标准,即梯度推动树组分质量的理论上合理测试,并表明使用这种方法代替共同的惩罚条件可以大大减少样本损失。此外,这种方法为树生长算法提供了一个理论上合理的停机条件。我们还为该方法提供了一些创新的扩展,为各种新的树切削算法打开了大门。

0

相关内容

集成修剪

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

同步辐射光电离质谱技术研究C3 Criegee中间体宏观反应动力学

国家自然科学基金

0+阅读 · 2015年12月31日

PP2Cδ调控的线粒体ROS通路在肺损伤和炎症中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

SIRT1介导组蛋白乙酰化在同型半胱氨酸致动脉粥样硬化中的作用及特异性miRNAs调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

TRPC1介导海马内少突胶质细胞与有髓神经纤维损伤对糖尿病认知功能障碍作用的机制探讨

国家自然科学基金

0+阅读 · 2013年12月31日

脑卒中后切应力变化介导血管内皮eNO/PGs失衡损害脑血流自动调节

国家自然科学基金

0+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

β淀粉样蛋白对神经突触传递和可塑性的影晌

国家自然科学基金

0+阅读 · 2011年12月31日

TR4在前列腺癌放疗中的作用及其机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

基于hUCB-MSCs的原位分层多基因增强Mosaicplasty重建灵长类动物大面积骨软骨复合性损伤

国家自然科学基金

0+阅读 · 2008年12月31日

Finding Minimum-Cost Explanations for Predictions made by Tree Ensembles

Arxiv

0+阅读 · 2023年3月16日

Enhancing COVID-19 Severity Analysis through Ensemble Methods

Arxiv

0+阅读 · 2023年3月16日

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

Arxiv

0+阅读 · 2023年3月15日

Estimation of continuous environments by robot swarms: Correlated networks and decision-making

Arxiv

0+阅读 · 2023年3月15日

Online Active Learning for Soft Sensor Development using Semi-Supervised Autoencoders

Arxiv

0+阅读 · 2023年3月15日

Predicted Embedding Power Regression for Large-Scale Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年3月14日

Generalization of generative model for neuronal ensemble inference method

Arxiv

0+阅读 · 2023年3月14日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Finding Minimum-Cost Explanations for Predictions made by Tree Ensembles

Arxiv

0+阅读 · 2023年3月16日

Enhancing COVID-19 Severity Analysis through Ensemble Methods

Arxiv

0+阅读 · 2023年3月16日

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

Arxiv

0+阅读 · 2023年3月15日

Estimation of continuous environments by robot swarms: Correlated networks and decision-making

Arxiv

0+阅读 · 2023年3月15日

Online Active Learning for Soft Sensor Development using Semi-Supervised Autoencoders

Arxiv

0+阅读 · 2023年3月15日

Predicted Embedding Power Regression for Large-Scale Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年3月14日

Generalization of generative model for neuronal ensemble inference method

Arxiv

0+阅读 · 2023年3月14日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

相关基金

同步辐射光电离质谱技术研究C3 Criegee中间体宏观反应动力学

国家自然科学基金

0+阅读 · 2015年12月31日

PP2Cδ调控的线粒体ROS通路在肺损伤和炎症中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

SIRT1介导组蛋白乙酰化在同型半胱氨酸致动脉粥样硬化中的作用及特异性miRNAs调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

TRPC1介导海马内少突胶质细胞与有髓神经纤维损伤对糖尿病认知功能障碍作用的机制探讨

国家自然科学基金

0+阅读 · 2013年12月31日

脑卒中后切应力变化介导血管内皮eNO/PGs失衡损害脑血流自动调节

国家自然科学基金

0+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

β淀粉样蛋白对神经突触传递和可塑性的影晌

国家自然科学基金

0+阅读 · 2011年12月31日

TR4在前列腺癌放疗中的作用及其机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

基于hUCB-MSCs的原位分层多基因增强Mosaicplasty重建灵长类动物大面积骨软骨复合性损伤

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员