关于解释、公平和适当依赖人类-大赦国际决策 (On Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making)

Proponents of explainable AI have often argued that it constitutes an essential path towards algorithmic fairness. Prior works examining these claims have primarily evaluated explanations based on their effects on humans' perceptions, but there is scant research on the relationship between explanations and distributive fairness of AI-assisted decisions. In this paper, we conduct an empirical study to examine the relationship between feature-based explanations and distributive fairness, mediated by human perceptions and reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, our findings suggest that such explanations do not enable humans to discern correct and wrong AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter stereotype-aligned AI recommendations. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results show that feature-based explanations are not a reliable mechanism to improve distributive fairness, as their ability to do so relies on a human-in-the-loop operationalization of the flawed notion of "fairness through unawareness". Finally, our study design provides a blueprint to evaluate the suitability of other explanations as pathways towards improved distributive fairness of AI-assisted decisions.

翻译：解释性解释的支持者们往往认为,这是通算公平的基本途径,而这是通向算法公平的重要途径。审查这些主张的先前工作主要是根据对人类认识的影响来评价解释,但很少研究AI协助作出的决定的解释和分配性公平之间的关系。在本文中,我们进行实证研究,审查基于特征的解释和分配性公平之间的关系,通过人类的认知和依赖AI的建议进行调解;我们的调查结果表明,解释影响公平性认识,而这反过来又与人类遵守AI建议的趋势有关。然而,我们的调查结果表明,这种解释不能使人类发现对AI建议的正确和错误的影响。相反,我们表明,无论AI协助作出的建议是否正确,这些解释都可能影响对解释和分配性公平之间的关系。根据其中的解释,我们可促进或阻碍分配性公平性:当解释强调与任务相关而且显然与AI建议相关联的特征时,这种解释就会取代与陈规定型观念一致的建议。同时,如果解释似乎与任务相关,则会诱导依赖行为,从而强化陈规定型观念错误。这些结果表明,无论AI建议是否正确无误,它们可能会影响到依赖性依赖性,而不论大赦国际建议是否正确无误。根据什么解释,根据解释性解释,根据什么解释性解释性解释,根据什么解释,根据解释,根据解释性解释性原则判断性解释性解释性解释,这种解释性原则判断性原则判断性解释,这种解释性解释性解释,而不是根据性解释性原则判断性解释性解释性原则判断性原则判断性原则判断性,是可靠性原则,是更可靠性解释性解释,是更可靠性原则改进性原则,因此。