It is now well known that neural networks can be wrong with high confidence in their predictions, leading to poor calibration. The most common post-hoc approach to compensate for this is to perform temperature scaling, which adjusts the confidences of the predictions on any input by scaling the logits by a fixed value. Whilst this approach typically improves the average calibration across the whole test dataset, this improvement typically reduces the individual confidences of the predictions irrespective of whether the classification of a given input is correct or incorrect. With this insight, we base our method on the observation that different samples contribute to the calibration error by varying amounts, with some needing to increase their confidence and others needing to decrease it. Therefore, for each input, we propose to predict a different temperature value, allowing us to adjust the mismatch between confidence and accuracy at a finer granularity. Furthermore, we observe improved results on OOD detection and can also extract a notion of hardness for the data-points. Our method is applied post-hoc, consequently using very little computation time and with a negligible memory footprint and is applied to off-the-shelf pre-trained classifiers. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets, showing that producing per-data-point temperatures is beneficial also for the expected calibration error across the whole test set. Code is available at: https://github.com/thwjoy/adats.
翻译:现在众所周知的是,神经网络在预测中信心很高,导致校准差强,因此,神经网络可能是错误的。最常用的后热方法就是进行温度缩放,通过以固定值缩放对任何输入的预测信心进行调整。虽然这种方法通常会改善整个测试数据集的平均校准,但这种改进通常会降低预测的个人信心,而不论给定输入的分类是否正确或不正确。有了这种洞察力,我们的方法基于不同样本在不同程度上促成校准错误的观察,有些需要提高它们的信心,而另一些则需要降低。因此,我们建议对每一项输入都预测不同的温度值,允许我们在一个细微微的颗粒上调整信心和准确性之间的不匹配。此外,我们观察关于OOD检测的改进结果,还可以提取数据点的硬度概念。我们的方法在事后应用,因此使用极小的计算时间,在微小的记忆足迹上,并在离线的温度上应用。我们用来预测一个不同的温度值网络,然后在10号前的服务器上,我们用一个标准来测试。