In this paper, we show that counterfactual explanations of confidence scores help users better understand and better trust an AI model's prediction in human-subject studies. Showing confidence scores in human-agent interaction systems can help build trust between humans and AI systems. However, most existing research only used the confidence score as a form of communication, and we still lack ways to explain why the algorithm is confident. This paper also presents two methods for understanding model confidence using counterfactual explanation: (1) based on counterfactual examples; and (2) based on visualisation of the counterfactual space.
翻译:本文显示,反事实解释信心得分有助于用户更好地了解和更好地信任AI模型对人类研究的预测。 显示人类剂互动系统的信任得分有助于建立人类和AI系统之间的信任。 然而,大多数现有研究仅将信任得分作为一种通信形式,我们仍无法解释算法为什么有信心。 本文还提出两种方法,利用反事实解释理解模型信心:(1) 基于反事实实例;(2) 基于反事实空间的可视化。