不平衡二元分类统计理论 (Statistical Theory for Imbalanced Binary Classification) - 专知论文

会员服务 ·

0

Performer · binary · UniFormer · 统计量 · 类别 ·

2021 年 7 月 5 日

Statistical Theory for Imbalanced Binary Classification

翻译：不平衡二元分类统计理论

Shashank Singh,Justin Khim

from arxiv, Parts of this paper have been revised from arXiv:2004.04715v2 [math.ST]

Within the vast body of statistical theory developed for binary classification, few meaningful results exist for imbalanced classification, in which data are dominated by samples from one of the two classes. Existing theory faces at least two main challenges. First, meaningful results must consider more complex performance measures than classification accuracy. To address this, we characterize a novel generalization of the Bayes-optimal classifier to any performance metric computed from the confusion matrix, and we use this to show how relative performance guarantees can be obtained in terms of the error of estimating the class probability function under uniform ($\mathcal{L}_\infty$) loss. Second, as we show, optimal classification performance depends on certain properties of class imbalance that have not previously been formalized. Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance. We analyze how Uniform Class Imbalance influences optimal classifier performance and show that it necessitates different classifier behavior than other types of class imbalance. We further illustrate these two contributions in the case of $k$-nearest neighbor classification, for which we develop novel guarantees. Together, these results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.

翻译：在为二进制分类制定的大量统计理论中,在不平衡分类方面没有多少有意义的结果,在这种分类中,数据主要来自两类中的某一类的样本。现有的理论至少面临两大挑战。首先,有意义的结果必须考虑到比分类准确性更复杂的业绩计量。为了解决这个问题,我们把贝耶斯最佳分类员的新的概括化描述为根据混乱矩阵计算的任何业绩衡量标准,我们用它来说明如何从在统一(mathcal{L ⁇ infty$)损失中估计等级概率函数的错误中获得相对性能保障。第二,正如我们所显示的那样,最佳分类性能取决于以前没有正式确定的类别不平衡的某些特性。具体地说,我们提出了一种新型的分类不平衡子类型,我们称之为统一等级平衡法,我们分析了统一等级对最佳分类性表现的影响,并表明它需要与其他类别不平衡不同的分类行为。我们用美元比最接近的邻居分类法来进一步说明这两种贡献,我们为此制定了新的保证。这些结果共同提供了第一种有意义的定式统计平衡性理论。

1

相关内容

Performer

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

1800页33章数学方法精要笔记 —深入数学建模，机器学习和深度学习的数学基础

专知会员服务

249+阅读 · 2020年7月3日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【电子书推荐】机器学习、神经网络和统计分类（Machine Learning, Neural Networks, and Statistical Classification）

【电子书推荐】机器学习、神经网络和统计分类（Machine Learning, Neural Networks, and Statistical Classification）

专知会员服务

29+阅读 · 2019年11月19日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

已删除

将门创投

4+阅读 · 2019年4月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

A logic for binary classifiers and their explanation

Arxiv

0+阅读 · 2021年9月7日

CIM: Class-Irrelevant Mapping for Few-Shot Classification

CIM: Class-Irrelevant Mapping for Few-Shot Classification

Arxiv

0+阅读 · 2021年9月7日

Dimensional Analysis in Statistical Modelling

Arxiv

0+阅读 · 2021年9月5日

Privacy of synthetic data: a statistical framework

Arxiv

0+阅读 · 2021年9月3日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

A Baseline for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年3月1日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Compassionately Conservative Balanced Cuts for Image Segmentation

Arxiv

5+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

1800页33章数学方法精要笔记 —深入数学建模，机器学习和深度学习的数学基础

专知会员服务

249+阅读 · 2020年7月3日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【电子书推荐】机器学习、神经网络和统计分类（Machine Learning, Neural Networks, and Statistical Classification）

【电子书推荐】机器学习、神经网络和统计分类（Machine Learning, Neural Networks, and Statistical Classification）

专知会员服务

29+阅读 · 2019年11月19日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

已删除

将门创投

4+阅读 · 2019年4月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

A logic for binary classifiers and their explanation

Arxiv

0+阅读 · 2021年9月7日

CIM: Class-Irrelevant Mapping for Few-Shot Classification

CIM: Class-Irrelevant Mapping for Few-Shot Classification

Arxiv

0+阅读 · 2021年9月7日

Dimensional Analysis in Statistical Modelling

Arxiv

0+阅读 · 2021年9月5日

Privacy of synthetic data: a statistical framework

Arxiv

0+阅读 · 2021年9月3日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

A Baseline for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年3月1日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Compassionately Conservative Balanced Cuts for Image Segmentation

Arxiv

5+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员