In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated, resulting in over-confident predictions. Miscalibration can be exacerbated by overfitting due to the minimization of the cross-entropy during training, as it promotes the predicted softmax probabilities to match the one-hot label assignments. This yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations. Recent evidence from the literature suggests that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of image classification, semantic segmentation and NLP benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, without affecting the discriminative performance. The code is available at https://github.com/by-liu/MbLS .


翻译:尽管深层神经网络有占主导地位的性能,但最近的工程表明,它们校准不力,导致过度自信的预测。由于在训练期间将交叉校准损失减少到最低程度,因此校准不当可能因过分校准而加剧。具体地说,这些损失可被视为线性处罚(或拉格朗格)的近似值(或拉格朗格)对逻辑距离施加平等限制的近似值。这表明了这种基本的平等限制的重要局限性,由此导致的梯度不断向非强化性解决方案推进,这可能妨碍在基于艺术的校准性能和校准模型的校准性能之间达成最佳的状态性能和状态性能。在基于梯度/平面的模型的校准标准中,我们提出了在基于梯度/平面的模型的精确性能标准中,在基于精确度/轨道的模型性能限制中,我们提出了一种基于基于精确度/比值的全面性能分类。

0
下载
关闭预览

相关内容

【CIKM2020】神经逻辑推理,Neural Logic Reasoning
专知会员服务
50+阅读 · 2020年8月25日
强化学习最新教程,17页pdf
专知会员服务
174+阅读 · 2019年10月11日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
103+阅读 · 2019年10月9日
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
已删除
将门创投
4+阅读 · 2018年6月1日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
65+阅读 · 2021年6月18日
Arxiv
11+阅读 · 2018年1月18日
VIP会员
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
已删除
将门创投
4+阅读 · 2018年6月1日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Top
微信扫码咨询专知VIP会员