Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the research gap, we propose the Clinical XAI Guidelines that consist of five criteria a clinical XAI needs to be optimized for. The guidelines recommend choosing an explanation form based on Guideline 1 (G1) Understandability and G2 Clinical relevance. For the chosen explanation form, its specific XAI technique should be optimized for G3 Truthfulness, G4 Informative plausibility, and G5 Computational efficiency. Following the guidelines, we conducted a systematic evaluation on a novel problem of multi-modal medical image explanation with two clinical tasks, and proposed new evaluation metrics accordingly. Sixteen commonly-used heatmap XAI techniques were evaluated and found to be insufficient for clinical use due to their failure in G3 and G4. Our evaluation demonstrated the use of Clinical XAI Guidelines to support the design and evaluation of clinically viable XAI.
翻译:在临床环境中应用XAI要求适当的评估标准,以确保解释技术在技术上是合理和临床上有用的,但实现这一目标却缺乏具体的支持。为了缩小研究差距,我们提议临床XAI准则,其中包括临床XAI需要优化的五个标准。指导方针建议根据准则1(G1)选择一个解释表,说明可理解性和G2临床相关性。对于选定的解释表,其特定的XAI技术应优化于G3真理、G4知情可信性和G5计算效率。按照准则,我们以两项临床任务对多模式医学形象解释的新问题进行了系统评估,并据此提出了新的评价标准。16项常用的热马普XAI技术由于在G3和G4中的失败而被认为不足以用于临床使用。我们的评价展示了使用临床XAI准则来支持临床上可行的XAI的设计和评价。