Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as methods that learn an additional function to calibrate an already trained base network. In this work, we intend to understand the post-hoc calibration methods from a theoretical point of view. Especially, it is known that minimizing Negative Log-Likelihood (NLL) will lead to a calibrated network on the training set if the global optimum is attained (Bishop, 1994). Nevertheless, it is not clear learning an additional function in a post-hoc manner would lead to calibration in the theoretical sense. To this end, we prove that even though the base network ($f$) does not lead to the global optimum of NLL, by adding additional layers ($g$) and minimizing NLL by optimizing the parameters of $g$ one can obtain a calibrated network $g \circ f$. This not only provides a less stringent condition to obtain a calibrated network but also provides a theoretical justification of post-hoc calibration methods. Our experiments on various image classification benchmarks confirm the theory.
翻译:神经网络的校准是一个关键方面,在将机器学习模型纳入现实世界决策体系时,考虑将机器校准神经网络网络的校准纳入现实世界决策体系时,其信心与决策本身同等重要。近年来,神经网络校准研究激增,大多数工程可归为热后校准方法,定义为学习额外功能以校准已经受过训练的基地网络的方法。在这项工作中,我们打算从理论角度理解制成校准后校准方法。特别是,众所周知,如果达到全球最佳标准(Bishop,1994年)。然而,尚不清楚的是,以后热方法学习的额外功能将导致理论意义上的校准。我们证明,即使基网络(ff美元)从理论角度来增加层数(gg美元),并通过优化美元的分类参数来尽量减少NLLL(NLL),能够获得校准网络的校准网络($g-crc fr)的网络校准网络,但也不能提供更严格的标准。