Graph neural networks (GNNs) have shown great power in modeling graph structured data. However, similar to other machine learning models, GNNs may make biased predictions w.r.t protected sensitive attributes, e.g., skin color and gender. This is because the training data often contains historical bias towards sensitive attributes. In addition, we empirically show that the discrimination in GNNs can be magnified by graph structures and the message-passing mechanism of GNNs. As a result, the applications of GNNs in high-stake domains such as crime rate prediction would be largely limited. Though extensive studies of fair classification have been conducted on i.i.d data, methods to address the problem of discrimination on non-i.i.d data are rather limited. Generally, learning fair models require abundant sensitive attributes to regularize the model. However, for many graphs such as social networks, users are reluctant to share sensitive attributes. Thus, only limited sensitive attributes are available for fair GNN training in practice. Moreover, directly collecting and applying the sensitive attributes in fair model training may cause privacy issues, because the sensitive information can be leaked in data breach or attacks on the trained model. Therefore, we study a novel and crucial problem of learning fair GNNs with limited and private sensitive attribute information. In an attempt to address these problems, FairGNN is proposed to eliminate the bias of GNNs whilst maintaining high accuracy by leveraging graph structures and limited sensitive information. We further extend FairGNN to NT-FairGNN which can achieve both fairness and privacy on sensitive attributes by using limited and private sensitive attributes. Theoretical analysis and extensive experiments on real-world datasets demonstrate the effectiveness of FairGNN and NT-FairGNN in achieving fair and high-accurate classification.
翻译:图像神经网络(GNN)在模拟图形结构化数据方面表现出巨大的力量。然而,与其他机器学习模型类似,GNN可能做出有偏见的预测,例如皮肤颜色和性别等,因为培训数据往往包含对敏感属性的历史偏见。此外,我们从经验上表明,GNN的歧视现象可以通过图形结构和GNN的信息传递机制而放大。因此,GNN在高取域(如犯罪率预测)中的应用将大为有限。尽管已经对i.i.r.r.t 保护敏感属性(如皮肤颜色和性别)进行了广泛的公平分类研究。尽管对i.r.t.d数据进行了广泛的公平分类研究,但解决非i.r.i.i.d数据歧视问题的方法相当有限。一般来说,学习公平模型需要大量的敏感属性来规范模型。然而,如社会网络,用户不愿意分享敏感属性。因此,只有有限的敏感属性可用于公平的GNNNE培训,如犯罪率预测等。此外,直接收集和应用在公平模型培训中的敏感属性可能会导致隐私问题,因为敏感信息的敏感信息在I.d.d.d.d.d.alalalalalationalation rialalalalation 和G上都可能通过有限的精度分析获得有限的精度,因此,我们通过精度的精度的精度的精度的精度的精度的精度的精度,从而在精度的精度的精度的精度能性G.