超出准确性以外的最佳二进制分类 (Optimal Binary Classification Beyond Accuracy) - 专知论文

会员服务 ·

0

binary · 模型评估 · 二分类 · Performer · 优化器 ·

2022 年 9 月 26 日

Optimal Binary Classification Beyond Accuracy

翻译：超出准确性以外的最佳二进制分类

Shashank Singh,Justin Khim

from arxiv, Parts of this paper have been revised from arXiv:2004.04715v2 [math.ST]

The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in imbalanced binary classification, where data are dominated by samples from one of two classes. The first part of this paper derives a novel generalization of the Bayes-optimal classifier from accuracy to any performance metric computed from the confusion matrix. Specifically, this result (a) demonstrates that stochastic classifiers sometimes outperform the best possible deterministic classifier and (b) removes an empirically unverifiable absolute continuity assumption that is poorly understood but pervades existing results. We then demonstrate how to use this generalized Bayes classifier to obtain regret bounds in terms of the error of estimating regression functions under uniform loss. Finally, we use these results to develop some of the first finite-sample statistical guarantees specific to imbalanced binary classification. Specifically, we demonstrate that optimal classification performance depends on properties of class imbalance, such as a novel notion called Uniform Class Imbalance, that have not previously been formalized. We further illustrate these contributions numerically in the case of $k$-nearest neighbor classification

翻译：有关二进制分类的绝大多数统计理论都以准确性为特征,然而,众所周知,在许多情况中,准确性没有充分反映分类错误的实际后果,最著名的是不平衡的二进制分类,数据主要来自两类中的某一类。本文件第一部分从精确性到任何从混乱矩阵计算的业绩衡量标准,对巴耶斯最佳分类者作了新的概括性归纳,具体地说,这一结果(a) 表明,随机性分类者有时比可能的最佳确定性分类者表现得更好,(b) 消除了一种实证上无法核实的绝对连续性假设,这种假设不易理解,但渗透了现有结果。然后,我们展示了如何使用这个通用的贝亚斯分类者,在估算统一损失下的回归功能时,获得遗憾的界限。最后,我们利用这些结果来开发出一些首个限定性统计抽样保证,具体地说,就是不平衡的二进制分类。具体地说,我们证明,最佳分类性表现取决于类别不平衡的特性,例如以前没有正式化过的新概念。我们进一步用数字来说明这些贡献。

0

相关内容

binary

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

循环肿瘤细胞Stat3/Twist双信号通路交互作用对EMT编程的乳腺癌转移的调控与干预

国家自然科学基金

0+阅读 · 2014年12月31日

基于溶液法低电压有机薄膜晶体管的高灵敏度microRNA传感器

国家自然科学基金

0+阅读 · 2013年12月31日

磁流体及其相关模型的定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

新型多孔金属氧化物对胰蛋白酶的高效亲和固定化与导向性分离研究

国家自然科学基金

0+阅读 · 2013年12月31日

MSC-L的抗肿瘤基质细胞作用及其抑癌功效的研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土-过渡族化合物磁热效应的物理机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

CD基因修饰内皮祖细胞联合磁流体热疗靶向治疗肝癌的研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化修饰调控COPD气道平滑肌细胞增殖及中药干预机制

国家自然科学基金

0+阅读 · 2011年12月31日

非定常流场自适应鲁棒降阶模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

Weak Identification with Bounds in a Class of Minimum Distance Models

Arxiv

0+阅读 · 2022年11月2日

Xtreme Margin: A Tunable Loss Function for Binary Classification Problems

Arxiv

0+阅读 · 2022年10月31日

sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification

Arxiv

0+阅读 · 2022年10月31日

Studying inductive biases in image classification task

Arxiv

0+阅读 · 2022年10月31日

Analysis of Evolutionary Diversity Optimisation for Permutation Problems

Arxiv

0+阅读 · 2022年10月31日

Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers

Arxiv

0+阅读 · 2022年10月28日

Evaluating the Impact of Loss Function Variation in Deep Learning for Classification

Arxiv

0+阅读 · 2022年10月28日

Text Classification Algorithms: A Survey

Arxiv

16+阅读 · 2020年5月20日

Few-Shot Graph Classification with Model Agnostic Meta-Learning

Arxiv

23+阅读 · 2020年3月18日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Weak Identification with Bounds in a Class of Minimum Distance Models

Arxiv

0+阅读 · 2022年11月2日

Xtreme Margin: A Tunable Loss Function for Binary Classification Problems

Arxiv

0+阅读 · 2022年10月31日

sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification

Arxiv

0+阅读 · 2022年10月31日

Studying inductive biases in image classification task

Arxiv

0+阅读 · 2022年10月31日

Analysis of Evolutionary Diversity Optimisation for Permutation Problems

Arxiv

0+阅读 · 2022年10月31日

Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers

Arxiv

0+阅读 · 2022年10月28日

Evaluating the Impact of Loss Function Variation in Deep Learning for Classification

Arxiv

0+阅读 · 2022年10月28日

Text Classification Algorithms: A Survey

Arxiv

16+阅读 · 2020年5月20日

Few-Shot Graph Classification with Model Agnostic Meta-Learning

Arxiv

23+阅读 · 2020年3月18日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

相关基金

循环肿瘤细胞Stat3/Twist双信号通路交互作用对EMT编程的乳腺癌转移的调控与干预

国家自然科学基金

0+阅读 · 2014年12月31日

基于溶液法低电压有机薄膜晶体管的高灵敏度microRNA传感器

国家自然科学基金

0+阅读 · 2013年12月31日

磁流体及其相关模型的定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

新型多孔金属氧化物对胰蛋白酶的高效亲和固定化与导向性分离研究

国家自然科学基金

0+阅读 · 2013年12月31日

MSC-L的抗肿瘤基质细胞作用及其抑癌功效的研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土-过渡族化合物磁热效应的物理机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

CD基因修饰内皮祖细胞联合磁流体热疗靶向治疗肝癌的研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化修饰调控COPD气道平滑肌细胞增殖及中药干预机制

国家自然科学基金

0+阅读 · 2011年12月31日

非定常流场自适应鲁棒降阶模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员