With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibration errors, which relates every calibration error to a proper score and provides a respective upper bound with optimal estimation properties. This relationship can be used to reliably quantify the model calibration improvement. We theoretically and empirically demonstrate the shortcomings of commonly used estimators compared to our approach. Due to the wide applicability of proper scores, this gives a natural extension of recalibration beyond classification.
翻译:由于模型可靠性对于敏感的真实世界应用至关重要,实践者越来越重视改进深神经网络的不确定性校准。校准错误旨在量化概率预测的可靠性,但其估计者通常是偏差和不一致的。在这项工作中,我们引入了适当的校准错误框架,将每个校准错误都与适当的分数挂钩,并提供了相应的上限,提供了最佳估计属性。这种关系可用于可靠地量化模型校准改进。我们在理论上和经验上都证明了与我们的方法相比常用的测算员的缺点。由于正确分数的广泛适用性,这自然扩大了校准范围,超出了分类范围。