Neural networks lack adversarial robustness, i.e., they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated predictions, i.e., the predicted probability is not a good indicator of how much we should trust our model. In this paper, we study the connection between adversarial robustness and calibration and find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. Based on this insight, we examine if calibration can be improved by addressing those adversarially unrobust inputs. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and calibration into training by adaptively softening labels for an example based on how easily it can be attacked by an adversary. We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration over the model even under distributional shifts. In addition, AR-AdaLS can also be applied to an ensemble model to further improve model calibration.
翻译:神经网络缺乏对抗的稳健性, 也就是说, 它们很容易受到通过对投入的小扰动导致不正确的预测的对抗性例子的影响。 此外, 当模型做出错误的预测时, 信任就会受到损害, 也就是说, 预测的概率并不是我们应该信任模型的好指标。 在本文中, 我们研究对对抗性强力和校准之间的关系, 发现模型对小扰动( 很容易受到攻击)敏感的投入更有可能有错误的预测。 基于这一洞察, 我们研究是否通过处理那些对抗性不严谨的预测来改进校准。 为此, 我们建议基于适应性拉贝平滑动( AR- AdaLS) 的Aversarial 强力和校准( AR- AdaLS), 将对抗性强力和校准的关联性纳入培训, 并发现模型更容易被对手攻击( 很容易攻击) 。 我们发现, 我们的方法, 加上分配数据的对抗性稳健性数据, 会导致更好的校准模型, 甚至在分布式变化下 。