The error of an estimator can be decomposed into a (statistical) bias term, a variance term, and an irreducible noise term. When we do bias analysis, formally we are asking the question: "how good are the predictions?" The role of bias in the error decomposition is clear: if we trust the labels/targets, then we would want the estimator to have as low bias as possible, in order to minimize error. Fair machine learning is concerned with the question: "Are the predictions equally good for different demographic/social groups?" This has naturally led to a variety of fairness metrics that compare some measure of statistical bias on subsets corresponding to socially privileged and socially disadvantaged groups. In this paper we propose a new family of performance measures based on group-wise parity in variance. We demonstrate when group-wise statistical bias analysis gives an incomplete picture, and what group-wise variance analysis can tell us in settings that differ in the magnitude of statistical bias. We develop and release an open-source library that reconciles uncertainty quantification techniques with fairness analysis, and use it to conduct an extensive empirical analysis of our variance-based fairness measures on standard benchmarks.
翻译:估计值错误可以分解成一个(统计)偏差术语、 差异术语和不可减少的噪音术语。 当我们进行偏差分析时,我们正式提出这样一个问题:“预测有多好?” 偏差在错误分解中的作用是明确的:如果我们相信标签/目标,那么我们就会希望估计值尽可能低的偏差,以便尽可能减少错误。 公平的机器学习与以下问题有关:“不同人口/社会群体的预测是否同样好?”这自然导致各种公平衡量标准,比较与社会特权群体和社会处境不利群体相对应的子类统计偏差的某种衡量标准。在本文中,我们提出基于群体差异均等的新的业绩计量组合。我们证明,如果群体统计偏差分析提供不完整的图象,以及群体偏差分析在统计偏差程度不同的情况下能告诉我们什么情况。我们开发和发行一个开放源图书馆,将不确定性的量化技术与公平分析相协调,并利用它来对差异标准基准进行广泛的基于经验分析。