In this paper, we propose a philosophical and experimental investigation of the problem of AI fairness in classification. We argue that implementing fairness in AI classification involves more work than just operationalizing a fairness metric. It requires establishing the explainability of the classification model chosen and of the principles behind it. Specifically, it involves making the training processes transparent, determining what outcomes the fairness criteria actually produce, and assessing their trade-offs by comparison with closely related models that would lead to a different outcome. To exemplify this methodology, we trained a model and developed a tool for disparity detection and fairness interventions, the package FairDream. While FairDream is set to enforce Demographic Parity, experiments reveal that it fulfills the constraint of Equalized Odds. The algorithm is thus more conservative than the user might expect. To justify this outcome, we first clarify the relation between Demographic Parity and Equalized Odds as fairness criteria. We then explain FairDream's reweighting method and justify the trade-offs reached by FairDream by a benchmark comparison with closely related GridSearch models. We draw conclusions regarding the way in which these explanatory steps can make an AI model trustworthy.
翻译:暂无翻译