提高数据质量,培训 " 逐步推动决策树 " 的动态培训 (Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees) - 专知论文

会员服务 ·

0

Boosting（一种模型训练加速方式） · MoDELS · 样例 · 示例 · Learning ·

2022 年 10 月 20 日

Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees

翻译：提高数据质量,培训 " 逐步推动决策树 " 的动态培训

Moacir Antonelli Ponti,Lucas de Angelis Oliveira,Juan Martín Román,Luis Argerich

Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose a method based on metrics computed from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example. We focus on datasets containing mostly tabular or structured data, for which the use of Decision Trees ensembles are still the state-of-the-art in terms of performance. We show results on detecting noisy labels in order to either remove them, improving models' metrics in synthetic and real datasets, as well as a productive dataset. Our methods achieved the best results overall when compared with confident learning and heuristics.

翻译：真实的世界数据集包含有错误的标签实例,妨碍模型的性能,特别是推广推广能力。此外,每个实例可能对学习有不同的贡献。这促使人们研究数据实例的作用,了解其对模型中良好指标的贡献。在本文中,我们提议了一种方法,其依据是,从“渐进推动决策树”的培训动态中计算出来的衡量尺度,以评估每个培训实例的行为。我们注重的数据集大多是表格或结构化数据,使用决策树组群仍然是业绩方面的最先进的数据。我们展示了探测噪音标签的结果,以便要么删除这些标签,改进合成和真实数据集中的模型指标,以及生产数据集。我们的方法与自信学习和超自然学相比,总体上取得了最佳的结果。

0

相关内容

Boosting（一种模型训练加速方式）

Boosting（一种模型训练加速方式）

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

再生核希尔伯特空间图像稀疏表达算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

约束Lp正则化问题算法及应用

国家自然科学基金

0+阅读 · 2012年12月31日

低能氮离子注入介导沙漠寡营养细菌全基因组突变及定向进化的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Haccpper环境中不锈钢表面活性与电化学噪声特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

根际促生菌Bacillus amyloliquefaciens SQR9与植物根系分泌物互作的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

应用蛋白质组学技术筛选、鉴定循环肝癌干细胞特异性标志物

国家自然科学基金

0+阅读 · 2008年12月31日

On the Change of Decision Boundaries and Loss in Learning with Concept Drift

Arxiv

0+阅读 · 2022年12月2日

Fully-Dynamic Decision Trees

Arxiv

0+阅读 · 2022年12月1日

Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

Arxiv

0+阅读 · 2022年11月30日

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

Arxiv

0+阅读 · 2022年11月30日

A fully asymptotic preserving decomposed multi-group method for the frequency-dependent radiative transfer equations

Arxiv

0+阅读 · 2022年11月30日

CRU: A Novel Neural Architecture for Improving the Predictive Performance of Time-Series Data

Arxiv

0+阅读 · 2022年11月30日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the Change of Decision Boundaries and Loss in Learning with Concept Drift

Arxiv

0+阅读 · 2022年12月2日

Fully-Dynamic Decision Trees

Arxiv

0+阅读 · 2022年12月1日

Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

Arxiv

0+阅读 · 2022年11月30日

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

Arxiv

0+阅读 · 2022年11月30日

A fully asymptotic preserving decomposed multi-group method for the frequency-dependent radiative transfer equations

Arxiv

0+阅读 · 2022年11月30日

CRU: A Novel Neural Architecture for Improving the Predictive Performance of Time-Series Data

Arxiv

0+阅读 · 2022年11月30日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

相关基金

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

再生核希尔伯特空间图像稀疏表达算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

约束Lp正则化问题算法及应用

国家自然科学基金

0+阅读 · 2012年12月31日

低能氮离子注入介导沙漠寡营养细菌全基因组突变及定向进化的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Haccpper环境中不锈钢表面活性与电化学噪声特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

根际促生菌Bacillus amyloliquefaciens SQR9与植物根系分泌物互作的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

应用蛋白质组学技术筛选、鉴定循环肝癌干细胞特异性标志物

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员