RecysSes公平度量数: 使用很多, 但选择哪个? (RecSys Fairness Metrics: Many to Use But Which One To Choose?)

In recent years, recommendation and ranking systems have become increasingly popular on digital platforms. However, previous work has highlighted how personalized systems might lead to unintentional harms for users. Practitioners require metrics to measure and mitigate these types of harms in production systems. To meet this need, many fairness definitions have been introduced and explored by the RecSys community. Unfortunately, this has led to a proliferation of possible fairness metrics from which practitioners can choose. The increase in volume and complexity of metrics creates a need for practitioners to deeply understand the nuances of fairness definitions and implementations. Additionally, practitioners need to understand the ethical guidelines that accompany these metrics for responsible implementation. Recent work has shown that there is a proliferation of ethics guidelines and has pointed to the need for more implementation guidance rather than principles alone. The wide variety of available metrics, coupled with the lack of accepted standards or shared knowledge in practice leads to a challenging environment for practitioners to navigate. In this position paper, we focus on this widening gap between the research community and practitioners concerning the availability of metrics versus the ability to put them into practice. We address this gap with our current work, which focuses on developing methods to help ML practitioners in their decision-making processes when picking fairness metrics for recommendation and ranking systems. In our iterative design interviews, we have already found that practitioners need both practical and reflective guidance when refining fairness constraints. This is especially salient given the growing challenge for practitioners to leverage the correct metrics while balancing complex fairness contexts.

翻译：近年来,在数字平台上,建议和排名制度越来越受到数字平台的欢迎。然而,以往的工作强调,个人化系统如何可能导致对用户的无意伤害。从业者要求制定衡量和减轻生产系统中这类伤害的指标。为了满足这一需要,RecSys社区引入和探索了许多公平的定义。不幸的是,由于缺少公认的标准或共享的实际知识,从业人员可以选择的公平度量环境变得十分艰难。在本立场文件中,衡量标准的数量和复杂性的增加使得从业者需要深入了解公平定义和执行的细微差别。此外,从业者需要理解这些衡量标准伴随的道德准则如何对用户造成无意的伤害。最近的工作表明,道德准则激增,指出需要制定更多的执行指导,而不是仅仅制定原则。现有的各种衡量标准,加上缺乏公认的标准或共享的实际知识,从业人员可以选择一种困难的环境。在本立场文件中,我们侧重于研究界和从业者之间在提供指标的复杂程度与将指标付诸实践的能力之间日益扩大的差距。我们当前工作的差距是弥补这一差距,我们当前工作的重点是制定方法,特别是制定公平性准则,同时,在设计不断改进的面试时,需要正确衡量的进度。