Explainable artificial intelligence (XAI) is motivated by the problem of making AI predictions understandable, transparent, and responsible, as AI becomes increasingly impactful in society and high-stakes domains. XAI algorithms are designed to explain AI decisions in human-understandable ways. The evaluation and optimization criteria of XAI are gatekeepers for XAI algorithms to achieve their expected goals and should withstand rigorous inspection. To improve the scientific rigor of XAI, we conduct the first critical examination of a common XAI criterion: plausibility. It measures how convincing the AI explanation is to humans, and is usually quantified by metrics on feature localization or correlation of feature attribution. Our examination shows, although plausible explanations can improve users' understanding and local trust in an AI decision, doing so is at the cost of abandoning other possible approaches of enhancing understandability, increasing misleading explanations that manipulate users, being unable to achieve complementary human-AI task performance, and deteriorating users' global trust in the overall AI system. Because the flaws outweigh the benefits, we do not recommend using plausibility as a criterion to evaluate or optimize XAI algorithms. We also identify new directions to improve XAI on understandability and utility to users including complementary human-AI task performance.
翻译:暂无翻译