Various tools and practices have been developed to support practitioners in identifying, assessing, and mitigating fairness-related harms caused by AI systems. However, prior research has highlighted gaps between the intended design of these tools and practices and their use within particular contexts, including gaps caused by the role that organizational factors play in shaping fairness work. In this paper, we investigate these gaps for one such practice: disaggregated evaluations of AI systems, intended to uncover performance disparities between demographic groups. By conducting semi-structured interviews and structured workshops with thirty-three AI practitioners from ten teams at three technology companies, we identify practitioners' processes, challenges, and needs for support when designing disaggregated evaluations. We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups on which to focus, and collecting datasets with which to conduct disaggregated evaluations. More generally, we identify impacts on fairness work stemming from a lack of engagement with direct stakeholders or domain experts, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.
翻译:开发了各种工具和做法,以支持从业人员识别、评估和减轻AI系统造成的与公平有关的伤害;然而,先前的研究突出显示了这些工具和做法的预期设计与特定情况下的使用之间的差距,包括组织因素在形成公平工作中的作用所造成的差距; 在本文件中,我们为其中一种做法调查了这些差距:对AI系统的分类评价,目的是揭示人口群体之间的业绩差异;与来自三个技术公司10个团队的33名AI从业人员进行半结构性访谈和结构化讲习班;我们在设计分类评价时查明了从业人员的过程、挑战和支助需求;我们发现,在选择业绩计量、确定最相关的直接利益攸关方和人口群体并收集用于进行分类评价的数据时,从业人员面临挑战;更一般而言,我们查明缺乏与直接利益攸关方或领域专家的接触、将客户置于边缘化群体之上的商业需要以及大规模部署AI系统对公平工作的影响。