机器学习分类的可变校准 (Variable-Based Calibration for Machine Learning Classifiers)

The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based calibration and metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant variable-based calibration error as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing recalibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based recalibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability.

翻译：在高取量域部署机器学习分类师需要对模型预测进行明确校准的信任分数。在本文中,我们引入了基于变量的校准概念,以给模型的校准特性定出一个感兴趣的变量,将传统的基于分的校准和诸如预期校准错误(欧洲经委会)等衡量标准普遍化。我们特别发现,具有近效的欧洲经委会模型,作为数据特征的函数,可以显示基于变量的校准错误。我们从理论上和实践上对多个众所周知的数据集展示了这一现象,并表明在应用现有校准方法后,该现象可以持续下去。为减轻这一问题,我们提出了检测、可视化和量化基于变量校准错误的战略。然后我们审视了当前基于分的校准方法的局限性,并探索了可能的修改。最后,我们讨论了这些发现的影响,强调,理解校准超越简单综合计量对公正和模型可解释性等努力至关重要。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

专知会员服务

39+阅读 · 2020年11月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日