Training graph classifiers able to distinguish between healthy brains and dysfunctional ones, can help identifying substructures associated to specific cognitive phenotypes. However, the mere predictive power of the graph classifier is of limited interest to the neuroscientists, which have plenty of tools for the diagnosis of specific mental disorders. What matters is the interpretation of the model, as it can provide novel insights and new hypotheses. In this paper we propose \emph{counterfactual graphs} as a way to produce local post-hoc explanations of any black-box graph classifier. Given a graph and a black-box, a counterfactual is a graph which, while having high structural similarity with the original graph, is classified by the black-box in a different class. We propose and empirically compare several strategies for counterfactual graph search. Our experiments against a white-box classifier with known optimal counterfactual, show that our methods, although heuristic, can produce counterfactuals very close to the optimal one. Finally, we show how to use counterfactual graphs to build global explanations correctly capturing the behaviour of different black-box classifiers and providing interesting insights for the neuroscientists.
翻译:能够区分健康大脑和功能失调的图表分类师能够区分出健康的大脑和功能失调的大脑,可以帮助识别与特定认知型相联的子结构。然而,仅仅的图形分类师的预测力对于神经科学家来说是有限的兴趣,因为神经科学家们拥有诊断特定精神疾病的大量工具。重要的是模型的解释,因为它可以提供新的洞察力和新的假设。在本文中,我们提议了 \ emph{ counterfactal 图形}, 以此来生成任何黑盒图形分类师的局部后热解。 根据一个图表和一个黑盒, 反事实是一种图表, 虽然与原始图表的结构高度相似, 却被黑盒在不同类别中的黑盒中分类。 我们提出并用实验性比较了用于反事实图形搜索的几种策略。 我们用一个已知最佳反事实的白箱分类师进行实验, 表明我们的方法虽然具有超理论性, 却可以产生非常接近最佳的反事实。 最后, 我们展示了如何使用反事实图表来构建全球解释, 正确捕捉到不同黑箱分析家的行为。