To make Explainable AI (XAI) systems trustworthy, understanding harmful effects is just as important as producing well-designed explanations. In this paper, we address an important yet unarticulated type of negative effect in XAI. We introduce explainability pitfalls(EPs), unanticipated negative downstream effects from AI explanations manifesting even when there is no intention to manipulate users. EPs are different from, yet related to, dark patterns, which are intentionally deceptive practices. We articulate the concept of EPs by demarcating it from dark patterns and highlighting the challenges arising from uncertainties around pitfalls. We situate and operationalize the concept using a case study that showcases how, despite best intentions, unsuspecting negative effects such as unwarranted trust in numerical explanations can emerge. We propose proactive and preventative strategies to address EPs at three interconnected levels: research, design, and organizational.
翻译:要使可解释的AI(XAI)系统可信,理解有害影响与提出精心设计的解释同样重要。在本文件中,我们讨论了XAI中一种重要但未经解释的负面效应类型。我们引入了解释性的陷阱,即AI解释中未预见到的下游负面效应,即使在无意操纵用户时也表现出来。EDP与黑暗模式不同,但与黑暗模式有关,后者是故意欺骗的做法。我们通过将EDP与黑暗模式区分开来,突出陷阱周围的不确定性带来的挑战来阐述EDP的概念。我们利用案例研究来定位和操作这一概念,该案例研究展示了尽管有最佳意图,如何产生无法预见的负面效应,如对数字解释的无端信任。我们提出了在三个相互关联的层面处理EP的主动和预防性战略:研究、设计和组织。