Modern intrusion detection systems (IDS) leverage graph neural networks (GNNs) to detect malicious activity in system provenance data, but their decisions often remain a black box to analysts. This paper presents a comprehensive XAI framework designed to bridge the trust gap in Security Operations Centers (SOCs) by making graph-based detection transparent. We implement this framework on top of KAIROS, a state-of-the-art temporal graph-based IDS, though our design is applicable to any temporal graph-based detector with minimal adaptation. The complete codebase is available at https://github.com/devang1304/provex.git. We augment the detection pipeline with post-hoc explanations that highlight why an alert was triggered, identifying key causal subgraphs and events. We adapt three GNN explanation methods - GraphMask, GNNExplainer, and a variational temporal GNN explainer (VA-TGExplainer) - to the temporal provenance context. These tools output human-interpretable representations of anomalous behavior, including important edges and uncertainty estimates. Our contributions focus on the practical integration of these explainers, addressing challenges in memory management and reproducibility. We demonstrate our framework on the DARPA CADETS Engagement 3 dataset and show that it produces concise window-level explanations for detected attacks. Our evaluation reveals that the explainers preserve the TGNN's decisions with high fidelity, surfacing critical edges such as malicious file interactions and anomalous netflows. The average explanation overhead is 3-5 seconds per event. By providing insight into the model's reasoning, our framework aims to improve analyst trust and triage speed.
翻译:现代入侵检测系统利用图神经网络在系统溯源数据中检测恶意活动,但其决策过程对分析师而言往往仍是黑箱。本文提出一个全面的可解释人工智能框架,旨在通过使基于图的检测过程透明化,弥合安全运营中心存在的信任鸿沟。我们在当前最先进的基于时序图的入侵检测系统KAIROS之上实现了该框架,但我们的设计经过最小适配即可应用于任何基于时序图的检测器。完整代码库发布于 https://github.com/devang1304/provex.git。我们通过事后解释机制增强检测流程,突出显示警报触发原因,识别关键因果子图与事件。我们将三种图神经网络解释方法——GraphMask、GNNExplainer及变分时序图神经网络解释器——适配到时序溯源场景。这些工具可输出人类可理解的异常行为表征,包括重要边和不确定性估计。我们的贡献聚焦于这些解释器的实际集成,解决了内存管理与可复现性方面的挑战。我们在DARPA CADETS Engagement 3数据集上验证了该框架,证明其能为检测到的攻击生成简洁的窗口级解释。评估结果表明,解释器能以高保真度保持时序图神经网络的决策,揭示诸如恶意文件交互和异常网络流等关键边。平均每个事件的解释开销为3-5秒。通过揭示模型推理过程,本框架旨在提升分析师信任度与事件分诊效率。