With the widespread deployment of large-scale prediction systems in high-stakes domains, e.g., face recognition, criminal justice, etc., disparity in prediction accuracy between different demographic subgroups has called for fundamental understanding on the source of such disparity and algorithmic intervention to mitigate it. In this paper, we study the accuracy disparity problem in regression. To begin with, we first propose an error decomposition theorem, which decomposes the accuracy disparity into the distance between marginal label distributions and the distance between conditional representations, to help explain why such accuracy disparity appears in practice. Motivated by this error decomposition and the general idea of distribution alignment with statistical distances, we then propose an algorithm to reduce this disparity, and analyze its game-theoretic optima of the proposed objective functions. To corroborate our theoretical findings, we also conduct experiments on five benchmark datasets. The experimental results suggest that our proposed algorithms can effectively mitigate accuracy disparity while maintaining the predictive power of the regression models.
翻译:随着在高取域(如面部识别、刑事司法等)广泛部署大规模预测系统,不同人口分组之间的预测准确性差异要求从根本上了解这种差异的来源,并进行算法干预以缓解这种差异。在本文中,我们研究了回归中的准确性差异问题。首先,我们提出一个错误分解标语,将准确性差异分解到边缘标签分布和有条件表述之间的距离上,以帮助解释为什么在实践中出现这种准确性差异。受这一错误分解和与统计距离相匹配的一般想法的驱使,我们随后提出了一种算法,以缩小这一差异,并分析其拟议客观功能的游戏理论选择。为了证实我们的理论结论,我们还在五个基准数据集上进行了实验。实验结果表明,我们提议的算法可以有效地减少准确性差异,同时保持回归模型的预测力。