In hybrid human-AI systems, users need to decide whether or not to trust an algorithmic prediction while the true error in the prediction is unknown. To accommodate such settings, we introduce RETRO-VIZ, a method for (i) estimating and (ii) explaining trustworthiness of regression predictions. It consists of RETRO, a quantitative estimate of the trustworthiness of a prediction, and VIZ, a visual explanation that helps users identify the reasons for the (lack of) trustworthiness of a prediction. We find that RETRO-scores negatively correlate with prediction error across 117 experimental settings, indicating that RETRO provides a useful measure to distinguish trustworthy predictions from untrustworthy ones. In a user study with 41 participants, we find that VIZ-explanations help users identify whether a prediction is trustworthy or not: on average, 95.1% of participants correctly select the more trustworthy prediction, given a pair of predictions. In addition, an average of 75.6% of participants can accurately describe why a prediction seems to be (not) trustworthy. Finally, we find that the vast majority of users subjectively experience RETRO-VIZ as a useful tool to assess the trustworthiness of algorithmic predictions.
翻译:在混合的人类-AI系统中,用户需要决定是否相信算法预测,而预测中的真实错误却未知。为了适应这种环境,我们采用RETRO-VIZ,这是(一)估计和(二)解释回归预测可信度的一种方法,由RETRO、预测可信度的定量估计和VIZ组成,这是有助于用户查明预测(缺乏)可信度的原因的直观解释。我们发现,RETRO核心与117个试验环境的预测错误有负相关关系,表明RETRO提供了一种有用的措施,可以区分值得信赖的预测和不可靠的预测。在一项用户研究中,我们发现VIZ的勘探有助于用户确定预测是否可信:平均95.1%的参与者正确选择了比较可靠的预测,因为有一套预测。此外,平均75.6%的参与者可以准确地描述预测似乎(不可信)的原因。最后,我们发现绝大多数用户主观地体验了RETRO-VIZ的预测,作为评估信任的有用工具。