We present evidence for the existence and effectiveness of adversarial attacks on graph neural networks (GNNs) that aim to degrade fairness. These attacks can disadvantage a particular subgroup of nodes in GNN-based node classification, where nodes of the underlying network have sensitive attributes, such as race or gender. We conduct qualitative and experimental analyses explaining how adversarial link injection impairs the fairness of GNN predictions. For example, an attacker can compromise the fairness of GNN-based node classification by injecting adversarial links between nodes belonging to opposite subgroups and opposite class labels. Our experiments on empirical datasets demonstrate that adversarial fairness attacks can significantly degrade the fairness of GNN predictions (attacks are effective) with a low perturbation rate (attacks are efficient) and without a significant drop in accuracy (attacks are deceptive). This work demonstrates the vulnerability of GNN models to adversarial fairness attacks. We hope our findings raise awareness about this issue in our community and lay a foundation for the future development of GNN models that are more robust to such attacks.
翻译:我们提出证据,说明平面神经网络(GNN)对抗性攻击的存在和有效性,其目的是降低公平性;这些攻击可能不利于基于GNN的节点分类中某个特定节点分组,因为根基网络的节点具有敏感的特性,例如种族或性别;我们进行定性和实验性分析,解释对抗性联系注射如何损害GNN预测的公平性;例如,攻击者可能损害基于GNN的节点分类的公平性,办法是将属于对面分组的节点和属于相反类标签的节点之间注射对抗性联系;我们对实证数据集的实验表明,对抗性攻击可能大大降低GNNN预测(攻击是有效的)的公平性,其渗透率较低(攻击是有效的),而且没有显著降低准确性(攻击是欺骗性的);这项工作表明,GNNN模式很容易受到对抗性公平性攻击;我们希望我们的研究结果能够提高我们社区对这个问题的认识,并为GNNN模式的未来发展奠定基础,而这种模式对此类攻击更有力。