Reliable confidence estimation for the predictions is important in many safety-critical applications. However, modern deep neural networks are often overconfident for their incorrect predictions. Recently, many calibration methods have been proposed to alleviate the overconfidence problem. With calibrated confidence, a primary and practical purpose is to detect misclassification errors by filtering out low-confidence predictions (known as failure prediction). In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. We investigate this problem and reveal that popular confidence calibration methods often lead to worse confidence separation between correct and incorrect samples, making it more difficult to decide whether to trust a prediction or not. Finally, inspired by the natural connection between flat minima and confidence separation, we propose a simple hypothesis: flat minima is beneficial for failure prediction. We verify this hypothesis via extensive experiments and further boost the performance by combining two different flat minima techniques. Our code is available at https://github.com/Impression2805/FMFP
翻译:对预测的可靠信心估计在许多安全关键应用中非常重要。 但是,现代深层神经网络往往对错误预测过于自信。 最近,提出了许多校准方法来缓解过度自信问题。 有了校准信任,一个首要和实际目的是通过过滤低信心预测(称为失败预测)来发现错误分类错误。 在本文中,我们发现一个普遍、广泛存在但实际被忽视的现象,即多数信任校准方法对预测失败毫无用处或有害。 我们调查这一问题,并发现大众信任校准方法往往导致正确和不正确的样本之间更差的互信分离,使得更难决定是否信任预测。 最后,受平和信任分离之间自然联系的启发,我们提出了一个简单的假设:平小型工程有利于失败预测。我们通过广泛的实验来核实这一假设,并通过结合两种不同的小型技术来进一步提升性能。我们的代码可在https://github.com/Impresion2805/FMFPP中查阅。</s>