The applications of Artificial Intelligence (AI) surround decisions on increasingly many aspects of human lives. Society responds by imposing legal and social expectations for the accountability of such automated decision systems (ADSs). Fairness, a fundamental constituent of AI accountability, is concerned with just treatment of individuals and sensitive groups (e.g., based on sex, race). While many studies focus on fair learning and fairness testing for the classification tasks, the literature is rather limited on how to examine fairness in regression tasks. This work presents error parity as a regression fairness notion and introduces a testing methodology to assess group fairness based on a statistical hypothesis testing procedure. The error parity test checks whether prediction errors are distributed similarly across sensitive groups to determine if an ADS is fair. It is followed by a suitable permutation test to compare groups on several statistics to explore disparities and identify impacted groups. The usefulness and applicability of the proposed methodology are demonstrated via a case study on COVID-19 projections in the US at the county level, which revealed race-based differences in forecast errors. Overall, the proposed regression fairness testing methodology fills a gap in the fair machine learning literature and may serve as a part of larger accountability assessments and algorithm audits.
翻译:人工智能(AI)的应用围绕人类生活越来越多的许多方面的决策。社会的反应是对这种自动决策系统(ADS)的问责制强加法律和社会期望。公平是AI问责制的一个基本组成部分,它关心的是个人和敏感群体(例如基于性别、种族)的公正待遇。虽然许多研究侧重于对分类任务的公平学习和公平测试,但文献对于如何审查回归任务的公平性相当有限。这项工作将差错均等作为一个倒退公平概念提出,并采用一种测试方法,根据统计假设测试程序评估群体公平性。错误均等测试测试预测错误是否同样分布在敏感群体之间,以确定ADS是否公平。随后进行适当的调整测试,以比较若干统计数据中的群体,以探讨差异并查明受影响的群体。拟议方法的有用性和适用性通过对美国县一级的COVID-19预测的案例研究得到证明,该案例研究揭示了预测错误中的种族差异。总体而言,拟议的回归公平测试方法填补了公平机器学习文献中的空白,并可作为更广泛的问责评估和算法的一部分。