A multitude of explainability methods and associated fidelity performance metrics have been proposed to help better understand how modern AI systems make decisions. However, much of the current work has remained theoretical -- without much consideration for the human end-user. In particular, it is not yet known (1) how useful current explainability methods are in practice for more real-world scenarios and (2) how well associated performance metrics accurately predict how much knowledge individual explanations contribute to a human end-user trying to understand the inner-workings of the system. To fill this gap, we conducted psychophysics experiments at scale to evaluate the ability of human participants to leverage representative attribution methods for understanding the behavior of different image classifiers representing three real-world scenarios: identifying bias in an AI system, characterizing the visual strategy it uses for tasks that are too difficult for an untrained non-expert human observer as well as understanding its failure cases. Our results demonstrate that the degree to which individual attribution methods help human participants better understand an AI system varied widely across these scenarios. This suggests a critical need for the field to move past quantitative improvements of current attribution methods towards the development of complementary approaches that provide qualitatively different sources of information to human end-users.
翻译:提出了多种解释性方法和相关的忠诚性业绩衡量标准,以帮助更好地了解现代AI系统是如何作出决定的。然而,目前许多工作仍然是理论性的 -- -- 对人类终端用户没有给予多少考虑。特别是,目前还不知道:(1) 目前的解释性方法在实践中对更现实世界的情景有多大用处,(2) 相关的业绩衡量标准如何精确地预测了解个人解释如何有助于人类终端用户努力理解该系统的内部工作;为填补这一空白,我们进行了规模的心理物理学实验,以评估人类参与者利用代表性的归属方法来理解代表三种现实世界情景的不同图像分类者行为的能力:在AI系统中查明偏见,说明其用于对未受过训练的非专家人类观察员来说过于困难的任务的视觉战略,以及了解其失败案例。我们的结果表明,个人归属方法在多大程度上帮助人类参与者更好地了解一个人工系统。为了填补这一空白,我们进行了规模的心理物理学实验,以评估人类参与者利用代表性的归属方法来理解代表真实世界三种情景的不同图像分类者的行为的能力:确定在AI系统中的偏向人类终端用户提供质量上不同信息来源的补充方法。