Bias(Stress) 的沙箱工具- 测试公平性测试算法 (A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms)

Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite of algorithmic 'fairness-enhancing' remedies. Most existing algorithms, however, are agnostic to the sources of the observed unfairness. As a result, the literature currently lacks guiding frameworks to specify conditions under which each algorithmic intervention can potentially alleviate the underpinning cause of unfairness. To close this gap, we scrutinize the underlying biases (e.g., in the training data or design choices) that cause observational unfairness. We present the conceptual idea and a first implementation of a bias-injection sandbox tool to investigate fairness consequences of various biases and assess the effectiveness of algorithmic remedies in the presence of specific types of bias. We call this process the bias(stress)-testing of algorithmic interventions. Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline. This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark. In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention in the biased setting with true labels in the unbiased regime-that is, before any bias injection. We illustrate the utility of our toolkit via a proof-of-concept case study on synthetic data. Our empirical analysis showcases the type of insights that can be obtained through our simulations.

翻译：由于减少最低生活水平预测中的不公平现象的重要性日益增加,Fair-ML研究人员提出了一套广泛的算法“加强公平性”补救措施,但大多数现有的算法对观察到的不公平现象的来源是不可知的。因此,文献目前缺乏指导框架来说明每一种算法干预在哪些条件下可以减轻不公平现象的根本原因。为了缩小这一差距,我们仔细检查导致观察不公的基本偏见(例如培训数据或设计选择),我们提出了一套广泛的“加强公平性”补救措施的概念,并首次采用了一种偏向注射沙箱工具,以调查各种偏向的公平后果,并评估在存在特定类型的偏见的情况下算法补救措施的有效性。因此,我们把这一过程称为对算法干预的偏差(压力)测试。与现有的工具包不同,我们提供了一种控制环境,以对抗ML管道中的实际输入偏差。这种结构化的设置提供了测试公平性干预能力,超出了观察数据的观察数据或不偏向基准。我们可以测试一种给定的纠正方法,在进行具体类型的分析之前,我们进行一种纠正方法的补救是否能够减轻对结果的偏差性分析。我们的任何解释性分析是,在进行真实性分析之前,我们的任何解释性分析,我们的任何解释性分析是先先先用工具的检验。