选择性解释:利用人类投入与可解释的一致解释 (Selective Explanations: Leveraging Human Input to Align Explainable AI)

While a vast collection of explainable AI (XAI) algorithms have been developed in recent years, they are often criticized for significant gaps with how humans produce and consume explanations. As a result, current XAI techniques are often found to be hard to use and lack effectiveness. In this work, we attempt to close these gaps by making AI explanations selective -- a fundamental property of human explanations -- by selectively presenting a subset from a large set of model reasons based on what aligns with the recipient's preferences. We propose a general framework for generating selective explanations by leveraging human input on a small sample. This framework opens up a rich design space that accounts for different selectivity goals, types of input, and more. As a showcase, we use a decision-support task to explore selective explanations based on what the decision-maker would consider relevant to the decision task. We conducted two experimental studies to examine three out of a broader possible set of paradigms based on our proposed framework: in Study 1, we ask the participants to provide their own input to generate selective explanations, with either open-ended or critique-based input. In Study 2, we show participants selective explanations based on input from a panel of similar users (annotators). Our experiments demonstrate the promise of selective explanations in reducing over-reliance on AI and improving decision outcomes and subjective perceptions of the AI, but also paint a nuanced picture that attributes some of these positive effects to the opportunity to provide one's own input to augment AI explanations. Overall, our work proposes a novel XAI framework inspired by human communication behaviors and demonstrates its potentials to encourage future work to better align AI explanations with human production and consumption of explanations.

翻译：尽管近年来开发了大量可解释的AI(XAI)算法,但这些算法往往被批评为在人类如何产生和消费解释方面存在巨大差距。因此,目前的XAI技术往往很难使用,而且缺乏效力。在这项工作中,我们试图弥合这些差距,通过有选择地对AI作出解释 -- -- 一种基本的人类解释属性 -- -- 进行有选择的解释,有选择地从与受援国的偏好一致的一组大模型理由中提出一个子集。我们提出了一个有选择地作出有选择的解释的一般框架,在少数样本中利用人的投入。这个框架打开了一个丰富的设计空间,其中说明了不同的选择性目标、投入类型以及更多的。因此,我们利用一个决策支持任务,根据决策者认为与决策任务相关的内容来探索有选择性的解释。我们进行了两项实验,根据我们提议的框架,从一个可能更为广泛的模式中挑选出一个子集,我们的研究1,我们请参与者自己提供有选择性的解释,用开放性或基于批评性的投入来作出有选择性的解释。在研究2中,我们向参与者展示了有选择性的解释性的解释,从一个研究中,从一个研究的角度,从一个分析了人类的模型上展示了自己的判断性解释。