With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the \textit{robustness ratio}. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.
翻译:随着在高层决策中引入机器学习,确保算法公平已成为一个越来越重要的需要解决的问题。为此,提出了许多关于公平性的许多数学定义,并开发了各种优化技术,这些技术都旨在最大限度地实现一个界定的公平性概念。然而,公平的解决办法取决于培训数据的质量,并且对噪音具有高度敏感性。最近的研究显示,强(一个模型在隐蔽数据上良好表现的能力)在应对新问题时应当使用的战略类型中起着重要作用,因此,衡量这些战略的稳健性已成为一个根本问题。因此,我们为此提出了许多关于公平性定义的数学定义,并开发了各种优化技术,这些技术都旨在最大限度地实现一个定义的公平性概念。然而,我们利用三种最受欢迎的公平性战略来对四种最受欢迎的公平性定义进行多次广泛的试验。我们的实验经验表明,在应对新问题时,选择公平性方法对于所有经过评估的数据集中,都非常敏感,因此,衡量各种公平性战略的稳健性已成为一个基本的基本问题。在这项工作中,我们提出了一个新的标准是衡量各种公平性战略的稳健性标准,而这是我们最不那么,在评估的正确性战略中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,这种最不稳妥性方面,在评估中,在评估中,在评估中,在评估中,这种最佳的也是两种方法中,在评估中,在评估中,在评估中,在评估,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估,在评估,在评估,在评估,在评估,在评估,在评估中,在评估,在评估,在评估,在评估,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估中,在评估,在评估中,在评估,在评估,在评估,在评估,在评估,在评估,在评估,在评估,