Analysis of the fairness of machine learning (ML) algorithms recently attracted many researchers' interest. Most ML methods show bias toward protected groups, which limits the applicability of ML models in many applications like crime rate prediction etc. Since the data may have missing values which, if not appropriately handled, are known to further harmfully affect fairness. Many imputation methods are proposed to deal with missing data. However, the effect of missing data imputation on fairness is not studied well. In this paper, we analyze the effect on fairness in the context of graph data (node attributes) imputation using different embedding and neural network methods. Extensive experiments on six datasets demonstrate severe fairness issues in missing data imputation under graph node classification. We also find that the choice of the imputation method affects both fairness and accuracy. Our results provide valuable insights into graph data fairness and how to handle missingness in graphs efficiently. This work also provides directions regarding theoretical studies on fairness in graph data.
翻译:最近,对机器学习算法的公平性分析吸引了许多研究人员的兴趣。大多数ML方法显示偏向受保护群体,这限制了ML模型在犯罪率预测等许多应用中的适用性。由于数据可能缺少数值,如果处理不当,已知会进一步损害公平性。提出了许多估算方法来处理缺失的数据。然而,对缺失的数据估算对公平性的影响研究不够充分。在本文件中,我们利用不同的嵌入和神经网络方法分析了图形数据(结点属性)估算对公平性的影响。对六个数据集的广泛实验表明,在图形节点分类中缺失的数据估算存在严重的公平性问题。我们还发现,估算方法的选择既影响到公平性,也影响到准确性。我们的结果提供了对图表数据公正性和如何有效处理图中缺失问题的宝贵洞察力。在本文中,我们还分析了关于图表数据公正性的理论研究方向。