AI systems can create, propagate, support, and automate bias in decision-making processes. To mitigate biased decisions, we both need to understand the origin of the bias and define what it means for an algorithm to make fair decisions. Most group fairness notions assess a model's equality of outcome by computing statistical metrics on the outputs. We argue that these output metrics encounter intrinsic obstacles and present a complementary approach that aligns with the increasing focus on equality of treatment. By Locating Unfairness through Canonical Inverse Design (LUCID), we generate a canonical set that shows the desired inputs for a model given a preferred output. The canonical set reveals the model's internal logic and exposes potential unethical biases by repeatedly interrogating the decision-making process. We evaluate LUCID on the UCI Adult and COMPAS data sets and find that some biases detected by a canonical set differ from those of output metrics. The results show that by shifting the focus towards equality of treatment and looking into the algorithm's internal workings, the canonical sets are a valuable addition to the toolbox of algorithmic fairness evaluation.
翻译:AI系统可以在决策过程中创造、传播、支持和自动化偏见。 为了减轻偏向性决定, 我们双方都需要理解偏见的来源, 并定义算法在做出公平决定时意味着什么。 多数群体公平概念通过计算产出的统计指标来评估模型结果的平等性。 我们争辩说, 这些产出指标遇到了内在障碍, 并提出了与日益重视平等待遇相一致的补充方法。 通过通过Canonic Inverse Design( Canonical Inverseign) 来区分不公平性, 我们产生了一套直截了当的集, 显示给一个模型所偏好的产出所需的投入。 典型集揭示了模型的内部逻辑, 并通过反复询问决策过程暴露了潜在的不道德偏见。 我们对UCI成人和COMPAS数据集进行了LUCID评估, 并发现由一个直观组合所发现的某些偏差与产出指标的偏差不同。 结果显示, 通过将重点转向平等待遇和查看算法的内部工作, 这些直截面是算公平性评价工具箱的一个有价值的附加物。