Machine learning models are shown to face a severe threat from Model Extraction Attacks, where a well-trained private model owned by a service provider can be stolen by an attacker pretending as a client. Unfortunately, prior works focus on the models trained over the Euclidean space, e.g., images and texts, while how to extract a GNN model that contains a graph structure and node features is yet to be explored. In this paper, for the first time, we comprehensively investigate and develop model extraction attacks against GNN models. We first systematically formalise the threat modelling in the context of GNN model extraction and classify the adversarial threats into seven categories by considering different background knowledge of the attacker, e.g., attributes and/or neighbour connections of the nodes obtained by the attacker. Then we present detailed methods which utilise the accessible knowledge in each threat to implement the attacks. By evaluating over three real-world datasets, our attacks are shown to extract duplicated models effectively, i.e., 84% - 89% of the inputs in the target domain have the same output predictions as the victim model.
翻译:机器学习模型被展示为面临来自模型抽取攻击的严重威胁, 由服务提供商拥有的训练有素的私人模型可以被一个假装客户的攻击者偷走。 不幸的是, 先前的工程侧重于在Euclidean空间培训的模型, 例如图像和文本, 而如何提取含有图形结构和节点特征的GNN模型, 尚待探索。 在本文中, 我们首次全面调查和开发了针对GNN模型的模型抽取攻击模型。 我们首先系统地正式确定GNN模型中的威胁模型, 并将对抗性威胁分为七类, 考虑对攻击者的不同背景知识, 例如攻击者获得的节点的属性和(或)邻居连接。 然后我们提出详细的方法, 利用每个威胁中的可获取知识来实施攻击。 通过对三个真实世界数据集进行评估, 我们的攻击可以有效地提取重复的模型, 也就是说, 84% - 89% 的目标领域投入有与受害者模型相同的产出预测。