With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.
翻译:随着在高层决策中引入机器学习,确保算法公平已成为一个越来越重要的需要解决的问题。为此,提出了许多关于公平性的许多数学定义,并开发了各种优化技术,这些技术都旨在最大限度地实现一个定义的公平性概念。然而,公平解决方案依赖于培训数据的质量,并且对噪音具有高度敏感性。最近的研究显示,稳健性(一个模型能够很好地利用隐蔽数据)在应对新问题时应当使用的战略类型中起着重要作用,因此,衡量这些战略的稳健性已成为一个根本问题。因此,我们为此提出了许多关于公平性定义的数学定义,并开发了各种优化技术,这些技术都旨在最大限度地实现一个定义的公平性。然而,我们利用三种最受欢迎的公平性战略中的三种,对公平性数据进行多次广泛的实验。我们的实验经验表明,在应对新问题时,依赖临界性优化的公平性方法对所有评估数据集都非常敏感,尽管多数是其他方法都比得上得更稳妥,但衡量这些战略的稳妥性也已成为一个根本性的正确性。这与我们最有可能选择的其他两种方法相比,这是更公平的评估。