一种基于Weisfeiler-Leman着色系统生成图可解释人工智能基准的方法 (A method for the systematic generation of graph XAI benchmarks via Weisfeiler-Leman coloring)

Graph neural networks have become the de facto model for learning from structured data. However, the decision-making process of GNNs remains opaque to the end user, which undermines their use in safety-critical applications. Several explainable AI techniques for graphs have been developed to address this major issue. Focusing on graph classification, these explainers identify subgraph motifs that explain predictions. Therefore, a robust benchmarking of graph explainers is required to ensure that the produced explanations are of high quality, i.e., aligned with the GNN's decision process. However, current graph-XAI benchmarks are limited to simplistic synthetic datasets or a few real-world tasks curated by domain experts, hindering rigorous and reproducible evaluation, and consequently stalling progress in the field. To overcome these limitations, we propose a method to automate the construction of graph XAI benchmarks from generic graph classification datasets. Our approach leverages the Weisfeiler-Leman color refinement algorithm to efficiently perform approximate subgraph matching and mine class-discriminating motifs, which serve as proxy ground-truth class explanations. At the same time, we ensure that these motifs can be learned by GNNs because their discriminating power aligns with WL expressiveness. This work also introduces the OpenGraphXAI benchmark suite, which consists of 15 ready-made graph-XAI datasets derived by applying our method to real-world molecular classification datasets. The suite is available to the public along with a codebase to generate over 2,000 additional graph-XAI benchmarks. Finally, we present a use case that illustrates how the suite can be used to assess the effectiveness of a selection of popular graph explainers, demonstrating the critical role of a sufficiently large benchmark collection for improving the significance of experimental results.

翻译：图神经网络已成为处理结构化数据的标准模型。然而，图神经网络的决策过程对终端用户而言仍不透明，这限制了其在安全关键应用中的使用。为解决这一核心问题，已开发出多种针对图的可解释人工智能技术。聚焦于图分类任务，这些解释器能够识别出解释预测的子图模式。因此，需要对图解释器进行稳健的基准测试，以确保生成的解释具有高质量，即与图神经网络的决策过程保持一致。然而，当前的图可解释人工智能基准仅限于简单的合成数据集或少数由领域专家筛选的真实世界任务，这阻碍了严谨且可复现的评估，进而拖累了该领域的进展。为克服这些限制，我们提出一种方法，能够从通用图分类数据集中自动构建图可解释人工智能基准。我们的方法利用Weisfeiler-Leman颜色细化算法，高效执行近似子图匹配并挖掘类别区分性模式，这些模式可作为代理的真实类别解释。同时，我们确保这些模式能够被图神经网络学习，因为其区分能力与Weisfeiler-Leman算法的表达能力相符。本研究还引入了OpenGraphXAI基准套件，该套件包含15个即用型图可解释人工智能数据集，这些数据集通过将我们的方法应用于真实世界的分子分类数据集而生成。该套件已向公众开放，并附带一个代码库，可用于生成超过2000个额外的图可解释人工智能基准。最后，我们展示了一个用例，说明如何利用该套件评估一系列流行图解释器的有效性，证明了足够大规模的基准集合对于提升实验结果显著性的关键作用。