Graph Neural Networks (GNNs), which generalize traditional deep neural networks or graph data, have achieved state of the art performance on several graph analytical tasks like node classification, link prediction or graph classification. We focus on how trained GNN models could leak information about the \emph{member} nodes that they were trained on. In particular, we focus on answering the question: given a graph, can we determine which nodes were used for training the GNN model? We operate in the inductive settings for node classification, which means that none of the nodes in the test set (or the \emph{non-member} nodes) were seen during the training. We propose a simple attack model which is able to distinguish between the member and non-member nodes while just having a black-box access to the model. We experimentally compare the privacy risks of four representative GNN models. Our results show that all the studied GNN models are vulnerable to privacy leakage. While in traditional machine learning models, overfitting is considered the main cause of such leakage, we show that in GNNs the additional structural information is the major contributing factor.
翻译:将传统的深神经网络或图形数据归纳为通用的神经网络(GNNs), 已经在节点分类、 链接预测或图形分类等数个图表分析任务中取得了最新性能。 我们集中研究经过训练的GNN模型如何泄露关于它们所训练的 \ emph{ member} 节点的信息。 特别是, 我们专注于回答问题: 给一个图表, 我们能否确定哪些节点用于培训GNN模型? 我们是在节点分类的感应环境中操作的, 这意味着测试集( 或 \ emph{ 非成员} 节点) 没有看到任何节点在训练中被发现。 我们提出了一个简单的攻击模式, 能够区分成员和非成员节点, 而只是有一个黑盒访问模型。 我们实验性地比较了四个有代表性的GNNN模型的隐私风险。 我们的结果表明, 所有研究过的GNNN模型都容易发生隐私泄漏。 在传统机器学习模型中, 过度适应被认为是这种泄漏的主要原因, 我们显示在GNNNUS 中的额外结构信息是主要的促进因素。