Backdoor attack is a powerful attack algorithm to deep learning model. Recently, GNN's vulnerability to backdoor attack has been proved especially on graph classification task. In this paper, we propose the first backdoor detection and defense method on GNN. Most backdoor attack depends on injecting small but influential trigger to the clean sample. For graph data, current backdoor attack focus on manipulating the graph structure to inject the trigger. We find that there are apparent differences between benign samples and malicious samples in some explanatory evaluation metrics, such as fidelity and infidelity. After identifying the malicious sample, the explainability of the GNN model can help us capture the most significant subgraph which is probably the trigger in a trojan graph. We use various dataset and different attack settings to prove the effectiveness of our defense method. The attack success rate all turns out to decrease considerably.
翻译:后门攻击是深层学习模型的强大攻击算法。 最近, GNN 的后门攻击脆弱性在图表分类任务中得到了特别的证明。 在本文中,我们提出了第一个 GNN 的后门探测和防御方法。 大多数后门攻击都依赖于向干净的样本注射小但有影响力的触发器。 对于图形数据, 当前的后门攻击侧重于操纵图形结构以注入触发器。 我们发现, 在一些解释性评价指标( 如忠诚和不忠)中, 良性样本和恶意样本之间存在明显的差异。 在识别恶意样本后, GNN 模型的可解释性可以帮助我们捕捉到最重要的子子, 这可能是一个铁轨图中的触发器。 我们使用各种数据集和不同的攻击环境来证明我们的防御方法的有效性。 攻击成功率都明显下降。