Several pieces of work have uncovered performance disparities by conducting "disaggregated evaluations" of AI systems. We build on these efforts by focusing on the choices that must be made when designing a disaggregated evaluation, as well as some of the key considerations that underlie these design choices and the tradeoffs between these considerations. We argue that a deeper understanding of the choices, considerations, and tradeoffs involved in designing disaggregated evaluations will better enable researchers, practitioners, and the public to understand the ways in which AI systems may be underperforming for particular groups of people.
翻译:一些工作通过对AI系统进行“分类评估”,发现了绩效差异。我们在这些努力的基础上再接再厉,侧重于在设计分类评估时必须作出的选择,以及这些设计选择和这些考虑之间的权衡所依据的一些关键考虑因素。 我们争辩说,更深入地了解设计分类评估所涉及的选择、考虑和权衡将使研究人员、从业人员和公众更好地了解AI系统在某些人群中可能表现不佳的方式。