公平、无偏见的排名职能的政策级培训 (Policy-Gradient Training of Fair and Unbiased Ranking Functions)

While implicit feedback (e.g., clicks, dwell times, etc.) is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons. Exogenous reasons typically manifest themselves as biases in the training data, which then get reflected in the learned ranking policy and often lead to rich-get-richer dynamics. Moreover, even after the correction of such biases, reasons endogenous to the design of the learning algorithm can still lead to ranking policies that do not allocate exposure among items in a fair way. To address both exogenous and endogenous sources of unfairness, we present the first learning-to-rank approach that addresses both presentation bias and merit-based fairness of exposure simultaneously. Specifically, we define a class of amortized fairness-of-exposure constraints that can be chosen based on the needs of an application, and we show how these fairness criteria can be enforced despite the selection biases in implicit feedback data. The key result is an efficient and flexible policy-gradient algorithm, called FULTR, which is the first to enable the use of counterfactual estimators for both utility estimation and fairness constraints. Beyond the theoretical justification of the framework, we show empirically that the proposed algorithm can learn accurate and fair ranking policies from biased and noisy feedback.

翻译：虽然隐含的反馈(例如点击、时间时间等)是用于学习排名的丰富和有吸引力的数据来源,但这种数据可能会为外源和内源带来不公平的排名政策。外源原因通常表现为培训数据的偏差,然后反映在学习的排名政策中,往往导致富饶的动态。此外,即使纠正了这种偏差,学习算法设计中固有的原因仍然可能导致不公平地在项目之间分配接触的排序政策。为了解决不公平的外源和内源,我们提出了第一种从上至上的方法,既处理表述偏差,又同时处理基于功绩的公平暴露的公平性。具体地说,我们界定了一种基于应用需要可以选择的摊余公平性约束的类别,我们展示了这些公平性标准如何得以执行,尽管在隐含的反馈数据中存在偏差,但关键结果是高效和灵活的政策偏差算法,称为UCTTR,这是我们首先能够利用反事实估量的算法进行效用估计和基于功绩的公平性反馈。除了理论上的理由外,还能够从提议的精确和偏向性等级上学习。