减少校准误差估计中的偏差 (Mitigating bias in calibration error estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 有偏 · 置信度 · MoDELS · 欠估计 ·

2020 年 12 月 15 日

Mitigating bias in calibration error estimation

翻译：减少校准误差估计中的偏差

Rebecca Roelofs,Nicholas Cain,Jonathon Shlens,Michael C. Mozer

Building reliable machine learning systems requires that we correctly understand their level of confidence. Calibration focuses on measuring the degree of accuracy in a model's confidence and most research in calibration focuses on techniques to improve an empirical estimate of calibration error, ECE_bin. Using simulation, we show that ECE_bin can systematically underestimate or overestimate the true calibration error depending on the nature of model miscalibration, the size of the evaluation data set, and the number of bins. Critically, ECE_bin is more strongly biased for perfectly calibrated models. We propose a simple alternative calibration error metric, ECE_sweep, in which the number of bins is chosen to be as large as possible while preserving monotonicity in the calibration function. Evaluating our measure on distributions fit to neural network confidence scores on CIFAR-10, CIFAR-100, and ImageNet, we show that ECE_sweep produces a less biased estimator of calibration error and therefore should be used by any researcher wishing to evaluate the calibration of models trained on similar datasets.

翻译：建立可靠的机器学习系统要求我们正确理解其信任度。校准的重点是测量模型信任度的准确度,大多数校准研究侧重于改进校准错误经验估计的技术,ECE_bin。我们通过模拟,显示ECE_bin可以系统地低估或高估校准误差,这取决于模型校准的性质、评价数据集的大小和垃圾箱的数量。关键是,ECE_bin对校准精确度模型偏差较大。我们建议采用简单的校准误差标准,ECE_bin,其中选取了尽可能多的箱,同时在校准功能中保持单调。评估我们关于适合CIRA-10、CIFAR-100和图像网络神经网络信任分数的分布,我们表明ECE_webyer生成的校准误差估计器不那么偏差,因此,任何希望评价类似数据集所训练模型校准的研究人员都应该使用。

0

相关内容

估计/估计量

估计/估计量

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【MIT经典书】统计学习与序列预测，261页pdf

【MIT经典书】统计学习与序列预测，261页pdf

专知会员服务

78+阅读 · 2020年11月17日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

已删除

将门创投

4+阅读 · 2019年5月8日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Approximate Bayes factors for unit root testing

Arxiv

0+阅读 · 2021年2月19日

Confidently Comparing Estimators with the c-value

Arxiv

0+阅读 · 2021年2月19日

VAE Approximation Error: ELBO and Conditional Independence

VAE Approximation Error: ELBO and Conditional Independence

Arxiv

0+阅读 · 2021年2月18日

Loss Bounds for Approximate Influence-Based Abstraction

Arxiv

0+阅读 · 2021年2月18日

Robust and Differentially Private Mean Estimation

Arxiv

0+阅读 · 2021年2月18日

Causal Estimation with Functional Confounders

Arxiv

0+阅读 · 2021年2月17日

Viewpoint Estimation-Insights & Model

Viewpoint Estimation-Insights & Model

Arxiv

3+阅读 · 2018年7月3日

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

Arxiv

5+阅读 · 2018年4月16日

Fine-Grained Head Pose Estimation Without Keypoints

Arxiv

5+阅读 · 2018年4月13日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【MIT经典书】统计学习与序列预测，261页pdf

【MIT经典书】统计学习与序列预测，261页pdf

专知会员服务

78+阅读 · 2020年11月17日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

已删除

将门创投

4+阅读 · 2019年5月8日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Approximate Bayes factors for unit root testing

Arxiv

0+阅读 · 2021年2月19日

Confidently Comparing Estimators with the c-value

Arxiv

0+阅读 · 2021年2月19日

VAE Approximation Error: ELBO and Conditional Independence

VAE Approximation Error: ELBO and Conditional Independence

Arxiv

0+阅读 · 2021年2月18日

Loss Bounds for Approximate Influence-Based Abstraction

Arxiv

0+阅读 · 2021年2月18日

Robust and Differentially Private Mean Estimation

Arxiv

0+阅读 · 2021年2月18日

Causal Estimation with Functional Confounders

Arxiv

0+阅读 · 2021年2月17日

Viewpoint Estimation-Insights & Model

Viewpoint Estimation-Insights & Model

Arxiv

3+阅读 · 2018年7月3日

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

Arxiv

5+阅读 · 2018年4月16日

Fine-Grained Head Pose Estimation Without Keypoints

Arxiv

5+阅读 · 2018年4月13日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员