校准误差估计中减轻偏差 (Mitigating Bias in Calibration Error Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 有偏 · 置信度 · MoDELS · 欠估计 ·

2021 年 2 月 24 日

Mitigating Bias in Calibration Error Estimation

翻译：校准误差估计中减轻偏差

Rebecca Roelofs,Nicholas Cain,Jonathon Shlens,Michael C. Mozer

Building reliable machine learning systems requires that we correctly understand their level of confidence. Calibration measures the degree of accuracy in a model's confidence and most research in calibration focuses on techniques to improve an empirical estimate of calibration error, ECE_bin. We introduce a simulation framework that allows us to empirically show that ECE_bin can systematically underestimate or overestimate the true calibration error depending on the nature of model miscalibration, the size of the evaluation data set, and the number of bins. Critically, we find that ECE_bin is more strongly biased for perfectly calibrated models. We propose a simple alternative calibration error metric, ECE_sweep, in which the number of bins is chosen to be as large as possible while preserving monotonicity in the calibration function. Evaluating our measure on distributions fit to neural network confidence scores on CIFAR-10, CIFAR-100, and ImageNet, we show that ECE_sweep produces a less biased estimator of calibration error and therefore should be used by any researcher wishing to evaluate the calibration of models trained on similar datasets.

翻译：建立可靠的机器学习系统要求我们正确理解其信任度。校准测量模型信任度和大多数校准研究的准确度,重点是改进校准误差经验估计的技术,ECE_bin。我们引入了一个模拟框架,让我们能够从经验上表明,ECE_bin可以根据模型误差的性质、评价数据集的大小和垃圾箱的数量,系统地低估或高估校准误差。关键是,我们发现ECE_bin对校准无误的模型偏差更大。我们建议采用简单的校准误差指标,ECE_Sweep,其中选择了尽可能大的文件箱数量,同时保持校准功能的单一性。评估我们关于符合CFAR-10、CIFAR-100和图象网神经网络信任分数的分布情况,我们表明ECE_burp产生一个比较不偏差的校准误差估计器,因此,任何希望评价类似数据集所训练模型校准的研究人员都应该使用。

0

相关内容

估计/估计量

估计/估计量

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

已删除

将门创投

3+阅读 · 2018年8月21日

Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

Arxiv

0+阅读 · 2021年4月19日

The MIT Humanoid Robot: Design, Motion Planning, and Control For Acrobatic Behaviors

Arxiv

0+阅读 · 2021年4月19日

Optimal Pose and Shape Estimation for Category-level 3D Object Perception

Arxiv

0+阅读 · 2021年4月16日

Iterative Collaborative Filtering for Sparse Matrix Estimation

Iterative Collaborative Filtering for Sparse Matrix Estimation

Arxiv

0+阅读 · 2021年4月16日

A New Pathway to Approximate Energy Expenditure and Recovery of an Athlete

Arxiv

0+阅读 · 2021年4月16日

Investigating Failures of Automatic Translation in the Case of Unambiguous Gender

Arxiv

0+阅读 · 2021年4月16日

Rates of Bootstrap Approximation for Eigenvalues in High-Dimensional PCA

Arxiv

0+阅读 · 2021年4月15日

Mean-Squared Accuracy of Good-Turing Estimator

Arxiv

0+阅读 · 2021年4月14日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】迈向鲁棒的零样本强化学习

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

【普林斯顿博士论文】量化、评估与缓解现代机器学习系统中的风险

遥感中基于深度学习的领域自适应方法：全面综述

相关资讯

已删除

将门创投

3+阅读 · 2018年8月21日

相关论文

Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

Arxiv

0+阅读 · 2021年4月19日

The MIT Humanoid Robot: Design, Motion Planning, and Control For Acrobatic Behaviors

Arxiv

0+阅读 · 2021年4月19日

Optimal Pose and Shape Estimation for Category-level 3D Object Perception

Arxiv

0+阅读 · 2021年4月16日

Iterative Collaborative Filtering for Sparse Matrix Estimation

Iterative Collaborative Filtering for Sparse Matrix Estimation

Arxiv

0+阅读 · 2021年4月16日

A New Pathway to Approximate Energy Expenditure and Recovery of an Athlete

Arxiv

0+阅读 · 2021年4月16日

Investigating Failures of Automatic Translation in the Case of Unambiguous Gender

Arxiv

0+阅读 · 2021年4月16日

Rates of Bootstrap Approximation for Eigenvalues in High-Dimensional PCA

Arxiv

0+阅读 · 2021年4月15日

Mean-Squared Accuracy of Good-Turing Estimator

Arxiv

0+阅读 · 2021年4月14日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

微信扫码咨询专知VIP会员