Graph neural networks, a popular class of models effective in a wide range of graph-based learning tasks, have been shown to be vulnerable to adversarial attacks. While the majority of the literature focuses on such vulnerability in node-level classification tasks, little effort has been dedicated to analysing adversarial attacks on graph-level classification, an important problem with numerous real-life applications such as biochemistry and social network analysis. The few existing methods often require unrealistic setups, such as access to internal information of the victim models, or an impractically-large number of queries. We present a novel Bayesian optimisation-based attack method for graph classification models. Our method is black-box, query-efficient and parsimonious with respect to the perturbation applied. We empirically validate the effectiveness and flexibility of the proposed method on a wide range of graph classification tasks involving varying graph properties, constraints and modes of attack. Finally, we analyse common interpretable patterns behind the adversarial samples produced, which may shed further light on the adversarial robustness of graph classification models.
翻译:图表神经网络是一种受欢迎的模型,在以图为基础的广泛学习任务中有效,已被证明很容易受到对抗性攻击。虽然大多数文献侧重于在节点分类任务中这种脆弱性,但很少致力于分析图层分类方面的对抗性攻击,这是许多实际应用如生物化学和社会网络分析等重要问题。现有的方法很少,往往需要不切实际的设置,例如获得受害者模型的内部信息,或大量不切实际的查询。我们为图表分类模型介绍了一种新颖的巴耶西亚优化攻击方法。我们的方法是黑盒、查询效率高和对所应用的扰动性比较。我们从经验上验证了拟议方法在涉及不同图形特性、限制和攻击模式的范围广泛的图表分类任务上的有效性和灵活性。最后,我们分析了所制作的对抗性样本背后的共同可解释模式,这可能进一步说明图表分类模型的对抗性强。