Graph Neural Networks (GNNs) have boosted the performance for many graph-related tasks. Despite the great success, recent studies have shown that GNNs are highly vulnerable to adversarial attacks, where adversaries can mislead the GNNs' prediction by modifying graphs. On the other hand, the explanation of GNNs (GNNExplainer) provides a better understanding of a trained GNN model by generating a small subgraph and features that are most influential for its prediction. In this paper, we first perform empirical studies to validate that GNNExplainer can act as an inspection tool and have the potential to detect the adversarial perturbations for graphs. This finding motivates us to further initiate a new problem investigation: Whether a graph neural network and its explanations can be jointly attacked by modifying graphs with malicious desires? It is challenging to answer this question since the goals of adversarial attacks and bypassing the GNNExplainer essentially contradict each other. In this work, we give a confirmative answer to this question by proposing a novel attack framework (GEAttack), which can attack both a GNN model and its explanations by simultaneously exploiting their vulnerabilities. Extensive experiments on two explainers (GNNExplainer and PGExplainer) under various real-world datasets demonstrate the effectiveness of the proposed method.
翻译:虽然取得了巨大成功,但最近的研究表明,GNN极易受到对抗性攻击的影响,对手可以通过修改图表误导GNN的预测。另一方面,GNN(GNNExtrader)的解释通过产生一个小型的子图和对其预测影响最大的特征,使人们更好地了解经过培训的GNN模型。在本文中,我们首先进行经验研究,以证实GNNExtralainer能够充当检查工具,并有可能探测对平面图的对抗性扰动。这一发现激励我们进一步发起新的问题调查:用恶意愿望修改图形是否可以共同攻击GNNS的图形神经网络及其解释?由于对抗性攻击的目的和绕过GNNExtralainer基本上自相矛盾,因此很难回答这个问题。在本文中,我们首先通过提出一个新的攻击性框架(GEAtack)来证实这一问题,它既可以攻击GNNN模型,也可以同时利用它们的真实数据的脆弱性来解释其真实的模型和解释。