重新思考AI的解释性和可信度 (Rethinking AI Explainability and Plausibility)

Setting proper evaluation objectives for explainable artificial intelligence (XAI) is vital for making XAI algorithms follow human communication norms, support human reasoning processes, and fulfill human needs for AI explanations. In this article, we examine explanation plausibility, which is the most pervasive human-grounded concept in XAI evaluation. Plausibility measures how reasonable the machine explanation is compared to the human explanation. Plausibility has been conventionally formulated as an important evaluation objective for AI explainability tasks. We argue against this idea, and show how optimizing and evaluating XAI for plausibility is sometimes harmful, and always ineffective to achieve model understandability, transparency, and trustworthiness. Specifically, evaluating XAI algorithms for plausibility regularizes the machine explanation to express exactly the same content as human explanation, which deviates from the fundamental motivation for humans to explain: expressing similar or alternative reasoning trajectories while conforming to understandable forms or language. Optimizing XAI for plausibility regardless of the model decision correctness also jeopardizes model trustworthiness, as doing so breaks an important assumption in human-human explanation namely that plausible explanations typically imply correct decisions, and violating this assumption eventually leads to either undertrust or overtrust of AI models. Instead of being the end goal in XAI evaluation, plausibility can serve as an intermediate computational proxy for the human process of interpreting explanations to optimize the utility of XAI. We further highlight the importance of explainability-specific evaluation objectives by differentiating the AI explanation task from the object localization task.

翻译：设定合适的解释性人工智能（XAI）评估目标对于让XAI算法遵循人类沟通规范、支持人类推理过程并满足AI解释需求至关重要。在本文中，我们研究解释可信度，这是XAI评估中最普遍的人类基础概念。可信度衡量机器解释与人类解释相比有多合理。可信度一直被传统定位为AI解释任务中的一个重要评估目标。我们反对这个想法，并展示了优化和评估XAI的可信度有时是有害的，而且无法实现模型的可理解性、透明性和可信性。具体而言，为可信度评估XAI算法会将机器解释规范化为与人类解释完全相同的内容，这偏离了人类解释的根本动机——在符合可理解的表达形式或语言的前提下，表达类似或替代的推理轨迹。优化XAI的可信度而不考虑模型决策的正确性也会危及模型的可信性，因为这样做打破了人类之间解释的一个重要假设，即合理的解释通常意味着正确的决策，并且违反这个假设最终会导致对AI模型的低信任或高信任。可信度不能成为XAI评估的最终目标，而是可以作为解释优化的一个中间计算代理，以优化XAI的效用。我们进一步强调了解释能力特定的评估目标，通过区分AI解释任务和物体定位任务。