应对类别不平衡问题的集成学习和数据增强模型综述：组合、实现和评估 (A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation) - 专知论文

会员服务 ·

0

集成学习 · 不平衡 · 类别不平衡 · 不平衡数据 · 数据增强 ·

2023 年 4 月 6 日

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

翻译：应对类别不平衡问题的集成学习和数据增强模型综述：组合、实现和评估

Azal Ahmad Khan,Omkar Chaudhari,Rohitash Chandra

Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other classes. Ensemble learning that combines multiple models to obtain a robust model has been prominently used with data augmentation methods to address class imbalance problems. In the last decade, a number of strategies have been added to enhance ensemble learning and data augmentation methods, along with new methods such as generative adversarial networks (GANs). A combination of these has been applied in many studies, but the true rank of different combinations would require a computational review. In this paper, we present a computational review to evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems. We propose a general framework that evaluates 10 data augmentation and 10 ensemble learning methods for CI problems. Our objective was to identify the most effective combination for improving classification performance on imbalanced datasets. The results indicate that combinations of data augmentation methods with ensemble learning can significantly improve classification performance on imbalanced datasets. These findings have important implications for the development of more effective approaches for handling imbalanced datasets in machine learning applications.

翻译：在分类问题中，当一个类别的观测数量比其他类别少时，便会出现类别不平衡（CI）。集成学习将多个模型组合起来以获得强韧模型，并广泛使用数据增强方法来解决类别不平衡问题。在过去的十年中，出现了许多策略来增强集成学习和数据增强方法，包括生成对抗网络（GANs）等新方法。这些策略的组合已被应用于许多研究中，但要找出不同组合的真正排名需要进行计算审查。在本文中，我们提出了一个计算评估框架，评估10种数据增强和10种集成学习方法来解决着名的CI问题。我们的目标是确定在不平衡数据集上提高分类性能的最有效组合。结果表明，数据增强方法与集成学习方法的组合可以显著提高不平衡数据集的分类性能。这些发现对于开发在机器学习应用中处理不平衡数据集的更有效方法具有重要意义。

1

相关内容

集成学习

集成学习是使用一系列学习器进行学习，并使用某种规则把各个学习结果进行整合从而获得比单个学习器更好的学习效果的一种机器学习方法。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

102+阅读 · 2019年12月9日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

8+阅读 · 2015年12月31日

强磁场下深过冷Cu-Co、Al-Bi难混溶合金的形核、生长及凝固组织演变行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向海量高维不平衡样本数据的恶意代码聚类及同源自动分析理论及技术

国家自然科学基金

0+阅读 · 2014年12月31日

共掺杂BiFeO3薄膜铁电、铁磁性能及磁电耦合的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

多尺度道路数据监督学习的匹配与选取更新方法

国家自然科学基金

0+阅读 · 2013年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

多元逼近的贪婪算法与量子算法

国家自然科学基金

0+阅读 · 2009年12月31日

复杂环境下基于刚体模型和数据驱动的联合跟踪与分类

国家自然科学基金

1+阅读 · 2008年12月31日

适应多类型Insider Attack的入侵检测与精确定位方法的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Uncertainty-based Detection of Adversarial Attacks in Semantic Segmentation

Arxiv

0+阅读 · 2023年5月22日

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

Arxiv

0+阅读 · 2023年5月18日

Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications

Arxiv

72+阅读 · 2022年11月15日

Controllable Data Generation by Deep Learning: A Review

Arxiv

15+阅读 · 2022年7月19日

Causal Machine Learning: A Survey and Open Problems

Arxiv

70+阅读 · 2022年6月30日

Deep Learning for Medical Image Registration: A Comprehensive Review

Arxiv

20+阅读 · 2022年4月24日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Machine Learning: Algorithms, Models, and Applications

Arxiv

23+阅读 · 2022年1月6日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

VIP会员

文章信息

相关主题

类别不平衡

不平衡数据

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

102+阅读 · 2019年12月9日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Uncertainty-based Detection of Adversarial Attacks in Semantic Segmentation

Arxiv

0+阅读 · 2023年5月22日

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

Arxiv

0+阅读 · 2023年5月18日

Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications

Arxiv

72+阅读 · 2022年11月15日

Controllable Data Generation by Deep Learning: A Review

Arxiv

15+阅读 · 2022年7月19日

Causal Machine Learning: A Survey and Open Problems

Arxiv

70+阅读 · 2022年6月30日

Deep Learning for Medical Image Registration: A Comprehensive Review

Arxiv

20+阅读 · 2022年4月24日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Machine Learning: Algorithms, Models, and Applications

Arxiv

23+阅读 · 2022年1月6日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

相关基金

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

8+阅读 · 2015年12月31日

强磁场下深过冷Cu-Co、Al-Bi难混溶合金的形核、生长及凝固组织演变行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向海量高维不平衡样本数据的恶意代码聚类及同源自动分析理论及技术

国家自然科学基金

0+阅读 · 2014年12月31日

共掺杂BiFeO3薄膜铁电、铁磁性能及磁电耦合的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

多尺度道路数据监督学习的匹配与选取更新方法

国家自然科学基金

0+阅读 · 2013年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

多元逼近的贪婪算法与量子算法

国家自然科学基金

0+阅读 · 2009年12月31日

复杂环境下基于刚体模型和数据驱动的联合跟踪与分类

国家自然科学基金

1+阅读 · 2008年12月31日

适应多类型Insider Attack的入侵检测与精确定位方法的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员