失平衡分类的近近邻 (Under-bagging Nearest Neighbors for Imbalanced Classification) - 专知论文

会员服务 ·

0

近邻 · Bagging · 可约的 · Performer · 集成学习 ·

2021 年 9 月 1 日

Under-bagging Nearest Neighbors for Imbalanced Classification

翻译：失平衡分类的近近邻

Hanyuan Hang,Yuchao Cai,Hanfang Yang,Zhouchen Lin

In this paper, we propose an ensemble learning algorithm called \textit{under-bagging $k$-nearest neighbors} (\textit{under-bagging $k$-NN}) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors $k$, the expected sub-sample size $s$, and the bagging rounds $B$, optimal convergence rates for under-bagging $k$-NN can be achieved under mild assumptions w.r.t.~the arithmetic mean (AM) of recalls. Moreover, we show that with a relatively small $B$, the expected sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k$ can be reduced simultaneously, especially when the data are highly imbalanced, which leads to substantially lower time complexity and roughly the same space complexity. On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm.

翻译：在本文中,我们建议对不平衡的分类问题采用混合学习算法,称为\ textit{ under- bushing $k$-nN} (\ textit{ under- bucking $k$-NN}) 。在理论方面,我们通过开发新的学习理论分析,表明根据适当选择的参数,即最近的邻居人数(k美元)、预期的次级抽样规模(美元)和袋状回合(B$),在轻度假设(r.t.~回顾的算术平均值(AM)下,可以实现低价美元-NNN美元的最佳趋同率。此外,我们表明,如果使用相对小的B美元,预期的子抽样规模($)可能大大小于每轮包装中的培训数据数量($),而最近的邻居人数(美元)可以同时减少,特别是当数据高度失衡,导致时间复杂性大大降低,而且大致是相同的空间复杂性。在实际方面,我们进行数字实验,以通过高估的算法来验证我们所提出的效率的理论结果。

0

相关内容

【斯坦福Jiaxuan You】图学习在金融网络中的应用，24页ppt

【斯坦福Jiaxuan You】图学习在金融网络中的应用，24页ppt

专知会员服务

45+阅读 · 2021年9月19日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【2020新书】面向AI开发者的集成学习，146页pdf讲述bagging、bootstrap方法等

【2020新书】面向AI开发者的集成学习，146页pdf讲述bagging、bootstrap方法等

专知会员服务

93+阅读 · 2020年6月19日

经济学中的数据科学，Data Science in Economics，附22页pdf

经济学中的数据科学，Data Science in Economics，附22页pdf

专知会员服务

36+阅读 · 2020年4月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

常用的模型集成方法介绍：bagging、boosting 、stacking

常用的模型集成方法介绍：bagging、boosting 、stacking

机器之心

14+阅读 · 2019年5月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

多高的AUC才算高？

多高的AUC才算高？

ResysChina

7+阅读 · 2016年12月7日

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

Arxiv

0+阅读 · 2021年10月25日

Applying Regression Conformal Prediction with Nearest Neighbors to time series data

Arxiv

0+阅读 · 2021年10月25日

Active learning for imbalanced data under cold start

Arxiv

0+阅读 · 2021年10月22日

Error-Correcting Neural Networks for Semi-Lagrangian Advection in the Level-Set Method

Arxiv

0+阅读 · 2021年10月22日

How to Schedule Near-Optimally under Real-World Constraints

Arxiv

0+阅读 · 2021年10月22日

A Baseline for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年3月1日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

Coarse-to-fine Seam Estimation for Image Stitching

Arxiv

4+阅读 · 2018年5月24日

Feasibility Based Large Margin Nearest Neighbor Metric Learning

Arxiv

3+阅读 · 2018年5月2日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

【斯坦福Jiaxuan You】图学习在金融网络中的应用，24页ppt

【斯坦福Jiaxuan You】图学习在金融网络中的应用，24页ppt

专知会员服务

45+阅读 · 2021年9月19日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【2020新书】面向AI开发者的集成学习，146页pdf讲述bagging、bootstrap方法等

【2020新书】面向AI开发者的集成学习，146页pdf讲述bagging、bootstrap方法等

专知会员服务

93+阅读 · 2020年6月19日

经济学中的数据科学，Data Science in Economics，附22页pdf

经济学中的数据科学，Data Science in Economics，附22页pdf

专知会员服务

36+阅读 · 2020年4月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

常用的模型集成方法介绍：bagging、boosting 、stacking

常用的模型集成方法介绍：bagging、boosting 、stacking

机器之心

14+阅读 · 2019年5月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

多高的AUC才算高？

多高的AUC才算高？

ResysChina

7+阅读 · 2016年12月7日

相关论文

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

Arxiv

0+阅读 · 2021年10月25日

Applying Regression Conformal Prediction with Nearest Neighbors to time series data

Arxiv

0+阅读 · 2021年10月25日

Active learning for imbalanced data under cold start

Arxiv

0+阅读 · 2021年10月22日

Error-Correcting Neural Networks for Semi-Lagrangian Advection in the Level-Set Method

Arxiv

0+阅读 · 2021年10月22日

How to Schedule Near-Optimally under Real-World Constraints

Arxiv

0+阅读 · 2021年10月22日

A Baseline for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年3月1日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

Coarse-to-fine Seam Estimation for Image Stitching

Arxiv

4+阅读 · 2018年5月24日

Feasibility Based Large Margin Nearest Neighbor Metric Learning

Arxiv

3+阅读 · 2018年5月2日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员