Traditionally, recommender systems operate by returning a user a set of items, ranked in order of estimated relevance to that user. In recent years, methods relying on stochastic ordering have been developed to create "fairer" rankings that reduce inequality in who or what is shown to users. Complete randomization -- ordering candidate items randomly, independent of estimated relevance -- is largely considered a baseline procedure that results in the most equal distribution of exposure. In industry settings, recommender systems often operate via a two-step process in which candidate items are first produced using computationally inexpensive methods and then a full ranking model is applied only to those candidates. In this paper, we consider the effects of inequality at the first step and show that, paradoxically, complete randomization at the second step can result in a higher degree of inequality relative to deterministic ordering of items by estimated relevance scores. In light of this observation, we then propose a simple post-processing algorithm in pursuit of reducing exposure inequality that works both when candidate sets have a high level of imbalance and when they do not. The efficacy of our method is illustrated on both simulated data and a common benchmark data set used in studying fairness in recommender systems.
翻译:传统上,推荐者系统的运作方式是将用户返回一组物品,按与该用户的关联性估计顺序排列。近年来,基于随机排序的方法已经发展起来,以创造“更公平的”排名,减少向用户显示的是谁或什么的不平等。完全随机化 -- -- 随机性地订购候选物品,与估计的相关性无关 -- -- 在很大程度上被视为一个基线程序,其结果是最平等地分配暴露。在行业环境中,推荐者系统往往通过一个两步程序运作,首先使用计算成本低廉的方法制作候选物品,然后只对这些候选人采用完全的排名模式。在本文件中,我们考虑了不平等在第一步的影响,并表明,自相矛盾的是,在第二步完全随机化可以导致与根据估计的相关性分数对项目进行确定性排序相比的更大程度的不平等。根据这一观察,我们然后提出一个简单的后处理算法,以追求减少暴露不平等性,在候选人组合高度不平衡和没有选择时都起作用。我们的方法的效力在模拟数据和共同基准数据集上都有说明。