Calibration strengthens the trustworthiness of black-box models by producing better accurate confidence estimates on given examples. However, little is known about if model explanations can help confidence calibration. Intuitively, humans look at important features attributions and decide whether the model is trustworthy. Similarly, the explanations can tell us when the model may or may not know. Inspired by this, we propose a method named CME that leverages model explanations to make the model less confident with non-inductive attributions. The idea is that when the model is not highly confident, it is difficult to identify strong indications of any class, and the tokens accordingly do not have high attribution scores for any class and vice versa. We conduct extensive experiments on six datasets with two popular pre-trained language models in the in-domain and out-of-domain settings. The results show that CME improves calibration performance in all settings. The expected calibration errors are further reduced when combined with temperature scaling. Our findings highlight that model explanations can help calibrate posterior estimates.
翻译:校准加强了黑盒模型的可信度, 因为它对特定实例提出了更准确的可信度估计。 但是, 模型解释是否有助于信任校准, 却鲜为人知。 直观地说, 人类看重要的特征属性, 并决定模型是否可信。 同样, 解释可以告诉我们模型可能何时知道。 受此启发, 我们提议了一个名为 CME 的方法, 利用模型解释来使模型与非感应属性相比更不可信。 设想是, 当模型不甚自信时, 很难找到任何类别的强烈迹象, 因此符号不会给任何类别带来高分数, 反之亦然。 我们在六个数据集上进行了广泛的实验, 实验中有两个广受欢迎的预先训练的语言模型, 在主域内外设置中。 结果显示, CME 改善了所有环境的校准性能。 与温度缩放相结合时, 预期校准错误会进一步减少。 我们的发现, 模型解释可以帮助校准海边的估计数 。