Nowadays, the analysis of complex phenomena modeled by graphs plays a crucial role in many real-world application domains where decisions can have a strong societal impact. However, numerous studies and papers have recently revealed that machine learning models could lead to potential disparate treatment between individuals and unfair outcomes. In that context, algorithmic contributions for graph mining are not spared by the problem of fairness and present some specific challenges related to the intrinsic nature of graphs: (1) graph data is non-IID, and this assumption may invalidate many existing studies in fair machine learning, (2) suited metric definitions to assess the different types of fairness with relational data and (3) algorithmic challenge on the difficulty of finding a good trade-off between model accuracy and fairness. This survey is the first one dedicated to fairness for relational data. It aims to present a comprehensive review of state-of-the-art techniques in fairness on graph mining and identify the open challenges and future trends. In particular, we start by presenting several sensible application domains and the associated graph mining tasks with a focus on edge prediction and node classification in the sequel. We also recall the different metrics proposed to evaluate potential bias at different levels of the graph mining process; then we provide a comprehensive overview of recent contributions in the domain of fair machine learning for graphs, that we classify into pre-processing, in-processing and post-processing models. We also propose to describe existing graph data, synthetic and real-world benchmarks. Finally, we present in detail five potential promising directions to advance research in studying algorithmic fairness on graphs.
翻译:目前,以图表为模型的复杂现象分析在许多现实世界应用领域起着关键作用,因为决策可以产生强大的社会影响,然而,许多研究和论文最近都表明,机器学习模式可能导致个人之间可能存在不同的待遇和不公平的结果,在这方面,对图形采矿的算法贡献并非没有公平问题,并提出了与图表内在性质有关的一些具体挑战:(1) 图数据不是IID,这一假设可能使公平机器学习方面的许多现有研究失效,(2) 适当的衡量标准定义,以评估与关系数据之间的不同类型公平性,(3) 对在模型准确性与公平之间找到一个良好的交易困难的算法挑战。这一调查是首次专门研究关系数据是否公平的问题,目的是全面审查图表采矿方面的最先进技术,并查明公开的挑战和未来趋势。特别是,我们首先介绍几个合理的应用领域和相关图表采矿任务,重点是边缘预测和后期分类。 我们还回顾提议的不同指标,用以评估我们当前图表处理过程不同层次上的潜在偏差,然后将我们用于在图表处理前的当前图表分析中,然后将我们从最后的图表分析中,然后将我们用最新的图表分析中,我们用最新的图表分析中,然后将一个最新的图表分析为最后的图表分析方向进行。