Recent work has proposed stochastic Plackett-Luce (PL) ranking models as a robust choice for optimizing relevance and fairness metrics. Unlike their deterministic counterparts that require heuristic optimization algorithms, PL models are fully differentiable. Theoretically, they can be used to optimize ranking metrics via stochastic gradient descent. However, in practice, the computation of the gradient is infeasible because it requires one to iterate over all possible permutations of items. Consequently, actual applications rely on approximating the gradient via sampling techniques. In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Our experimental analysis shows that PL-Rank has a greater sample-efficiency and is computationally less costly than existing policy gradients, resulting in faster convergence at higher performance. PL-Rank further enables the industry to apply PL models for more relevant and fairer real-world ranking systems.
翻译:最近的工作提出了Stochacist Plackett-Luce(PL)排名模型,作为优化相关性和公平度度的可靠选择。与需要超效优化算法的确定式对等模型不同,PL模型是完全不同的。理论上,这些模型可以用来通过随机梯度梯度下降优化评级尺度。然而,在实践中,梯度的计算是不可行的,因为它要求对所有可能的物品变化进行循环。因此,实际应用依赖于通过抽样技术对梯度进行近似。在本文中,我们引入了一种新型算法:PL-Rank,该算法估计了PL排名模型的梯度(w.r.t.),该算法对相关性和公平度指标都进行了估计。与基于政策梯度的现有方法不同,PL-Rank利用了PL模型的具体结构以及等级指标。我们的实验分析表明,PL-Rank的采样效率更高,其计算成本低于现有的政策梯度,因此在更高的绩效上更快的趋同。PL-Rank进一步使工业能够将PL模型应用于更相关和更公平的现实世界等级系统。