Uncertainty quantification is a central challenge in reliable and trustworthy machine learning. Naive measures such as last-layer scores are well-known to yield overconfident estimates in the context of overparametrized neural networks. Several methods, ranging from temperature scaling to different Bayesian treatments of neural networks, have been proposed to mitigate overconfidence, most often supported by the numerical observation that they yield better calibrated uncertainty measures. In this work, we provide a sharp comparison between popular uncertainty measures for binary classification in a mathematically tractable model for overparametrized neural networks: the random features model. We discuss a trade-off between classification accuracy and calibration, unveiling a double descent like behavior in the calibration curve of optimally regularized estimators as a function of overparametrization. This is in contrast with the empirical Bayes method, which we show to be well calibrated in our setting despite the higher generalization error and overparametrization.
翻译:不确定性量化是可靠和可信机器学习中的核心挑战。在超参数神经网络的背景下,朴素的度量如最后一层的分数众所周知会产生过于自信的估计。已经提出了几种方法,从温度调节到神经网络的不同贝叶斯处理,以减轻过度自信,多以它们产生更好的校准不确定性度量的数值观察为支持。在这项工作中,我们在超参数神经网络的数学可处理模型——随机特征模型中,提供了受欢迎的二元分类不确定性度量的尖锐比较。我们讨论了分类精度和校准之间的权衡,并揭示了随着过度参数化的校准曲线中最优正则估计量的双下降行为。这与经验贝叶斯方法相反,我们展示它在我们的设置中是校准良好的,尽管具有更高的泛化误差和过度参数化。