GMOTE:基于戈斯语的少数民族过度抽样技术,用于不平衡的分类分类,以适应异常的尾端概率 (GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers) - 专知论文

会员服务 ·

0

过采样 · 异常点 · Performer · 高斯混合（模型） · 马哈拉诺比斯距离 ·

2021 年 5 月 9 日

GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers

翻译：GMOTE:基于戈斯语的少数民族过度抽样技术,用于不平衡的分类分类,以适应异常的尾端概率

Seung Jee Yang,Kyung Joon Cha

from arxiv, 20 pages, 6 figures

Classification of imbalanced data is one of the common problems in the recent field of data mining. Imbalanced data substantially affects the performance of standard classification models. Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE). However, since the methods such as SMOTE generate instances by linear interpolation, synthetic data space may look like a polygonal. Also, the oversampling methods generate outliers of the minority class. In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets. To avoid linear interpolation and to consider outliers, this proposed method generates instances by the Gaussian Mixture Model. Motivated by clustering-based multivariate Gaussian outlier score (CMGOS), we propose to adapt tail probability of instances through the Mahalanobis distance to consider local outliers. The experiment was carried out on a representative set of benchmark datasets. The performance of the GMOTE is compared with other methods such as SMOTE. When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score. Experimental results demonstrate the robust performance.

翻译：数据不平衡数据分类是最近数据挖掘领域常见的问题之一。数据平衡数据在很大程度上影响了标准分类模型的性能。数据级方法主要使用过度抽样方法解决问题,例如合成少数群体过量抽样技术(SMOTE),但是,由于SMOTE等方法产生线性内插,合成数据空间可能看起来像一个多边形。此外,过度抽样方法产生少数类的外源。在本文中,我们提议从统计角度对不平衡数据集进行基于少数类的过度抽样技术(GMOTE)的统计分析。为避免线性内插和考虑外源,拟议的方法产生了高斯混凝土模型的实例。由于基于集群的多变异性高斯离层评分(CMGGOS)的动力,我们提议通过马哈拉诺比距离来调整实例的尾端概率,以考虑本地外源。我们提议在一组有代表性的基准数据集上进行实验。GOTOSS的性能表现与STE1号机级分析结果相比更好。

0

相关内容

过采样

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

48+阅读 · 2019年12月19日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Improved Algorithm for Tolerant Junta Testing

Arxiv

0+阅读 · 2021年6月29日

Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning

Arxiv

0+阅读 · 2021年6月28日

Dizygotic Conditional Variational AutoEncoder for Multi-Modal and Partial Modality Absent Few-Shot Learning

Arxiv

0+阅读 · 2021年6月28日

Progressive Class-based Expansion Learning For Image Classification

Arxiv

0+阅读 · 2021年6月28日

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Arxiv

0+阅读 · 2021年6月26日

A hybrid model-based and learning-based approach for classification using limited number of training samples

Arxiv

0+阅读 · 2021年6月25日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Arxiv

10+阅读 · 2018年1月20日

VIP会员

文章信息

相关主题

高斯混合（模型）

马哈拉诺比斯距离

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

48+阅读 · 2019年12月19日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Improved Algorithm for Tolerant Junta Testing

Arxiv

0+阅读 · 2021年6月29日

Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning

Arxiv

0+阅读 · 2021年6月28日

Dizygotic Conditional Variational AutoEncoder for Multi-Modal and Partial Modality Absent Few-Shot Learning

Arxiv

0+阅读 · 2021年6月28日

Progressive Class-based Expansion Learning For Image Classification

Arxiv

0+阅读 · 2021年6月28日

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Arxiv

0+阅读 · 2021年6月26日

A hybrid model-based and learning-based approach for classification using limited number of training samples

Arxiv

0+阅读 · 2021年6月25日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Arxiv

10+阅读 · 2018年1月20日

微信扫码咨询专知VIP会员