We study the problem of semantic segmentation calibration. For image classification, lots of existing solutions are proposed to alleviate model miscalibration of confidence. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
翻译:我们研究语义分解校准问题。 对于图像分类, 提出了许多现有的解决方案来缓解模型对信任的错误校准。 但是, 至今为止, 语义分解的可信度校准研究仍然有限。 我们提供对语义分解模型校准的系统研究, 并提出了一个简单而有效的方法。 首先, 我们发现模型能力、 作物大小、 多尺度测试和预测正确性对校准有影响。 其中, 预测正确性, 特别是误判性, 对因过度信任造成的误校更为重要 。 其次, 我们提出一个简单、 统一和有效的方法, 即选择性的缩放, 将校准的准确/ 不正确的预测区分为缩放, 并更加注重错误的对线条滑动。 然后, 我们研究现有的校准方法, 将其与对语系分校准校准的选择性比例进行比较 。 我们用多种基准, 进行广泛的实验, 包括内部 和域位定校准, 并显示选择性的扩展始终超越其他方法 。