公平性干预：人工智能可解释性研究 (Fairness Interventions: A Study in AI Explainability)

This paper presents a philosophical and experimental study of fairness interventions in AI classification, centered on the explainability of corrective methods. We argue that ensuring fairness requires not only satisfying a target criterion, but also explaining which variables constrain its realization. When corrections are used to mitigate advantage transparently, they must remain sensitive to the distribution of true labels. To illustrate this approach, we built FairDream, a fairness package whose mechanism is made transparent for lay users, increasing the model's weights of errors on disadvantaged groups. While a user may intend to achieve Demographic Parity by the correction method, experiments show that FairDream tends towards Equalized Odds, revealing a conservative bias inherent to the data environment. We clarify the relationship between these fairness criteria, analyze FairDream's reweighting process, and compare its trade-offs with closely related GridSearch models. Finally, we justify the normative preference for Equalized Odds via an epistemological interpretation of the results, using their proximity with Simpson's paradox. The paper thus unites normative, epistemological, and empirical explanations of fairness interventions, to ensure transparency for the users.

翻译：本文对人工智能分类中的公平性干预进行了哲学与实验研究，聚焦于纠正方法的可解释性。我们认为，确保公平性不仅需要满足目标标准，还需解释哪些变量限制了其实现。当使用纠正方法透明地缓解优势时，必须保持对真实标签分布的敏感性。为阐明此方法，我们构建了FairDream公平性工具包，其机制对非专业用户透明，通过增加模型在弱势群体上的错误权重来实现干预。尽管用户可能旨在通过纠正方法实现人口统计均等，但实验表明FairDream倾向于趋向均衡几率，揭示了数据环境中固有的保守偏差。我们厘清了这些公平性标准之间的关系，分析了FairDream的重新加权过程，并将其权衡与密切相关的GridSearch模型进行比较。最后，我们通过结果与辛普森悖论的近似性，基于认识论解释论证了对均衡几率的规范性偏好。本文因此融合了公平性干预的规范性、认识论与实证解释，以确保对用户的透明度。