We study the problem of semantic segmentation calibration. Lots of solutions have been proposed to approach model miscalibration of confidence in image classification. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
翻译:我们研究语义分割校准问题。针对图像分类模型置信度的错误校准,已经提出了许多方案。然而,到目前为止,关于语义分割置信度校准的研究仍然很有限。我们提供了关于语义分割模型校准的系统研究,并提出了一种简单而有效的方法。首先,我们发现模型容量、裁剪尺寸、多尺度测试和预测正确度对校准有影响。其中,预测正确度,尤其是误判,对于由于过度自信而导致的误校准更为重要。其次,我们提出了一种简单、统一、有效的方法,即选择性缩放,通过将正确和错误的预测分开进行缩放,并更加关注误判的logit平滑来实现。然后,我们研究了当前流行的校准方法,并将它们与选择性缩放在语义分割校准上进行比较。我们在各种基准测试中进行了广泛的实验,涵盖了域内和域移校准,并展示了选择性缩放在语义分割校准中始终表现最佳。