We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied sequential bias in such settings -- namely, dependencies between the evaluation outcome and the order in which the candidates appear -- we propose a natural model for the evaluator's rating process that captures the lack of calibration inherent to such a task. We conduct crowdsourcing experiments to demonstrate various facets of our model. We then proceed to study how to correct sequential bias under our model by posing this as a statistical inference problem. We propose a near-linear time, online algorithm for this task and prove guarantees in terms of two canonical ranking metrics. We also prove that our algorithm is information theoretically optimal, by establishing matching lower bounds in both metrics. Finally, we show that our algorithm outperforms the de facto method of using the rankings induced by the reported scores.
翻译:我们考虑顺序评估的问题,在这种评估中,一位评价员按顺序观察候选人,并以在线、不可撤销的方式向这些候选人分配分数。受研究这些环境中的顺序偏向 -- -- 即评价结果与候选人出现的顺序之间的依赖性 -- -- 的心理学文献的启发,我们为评价员的评级进程提出了一个自然模型,该模型反映了这项任务所固有的缺乏校准性。我们进行了众包实验,以展示我们的模型的各个方面。我们接着研究如何纠正我们模型下的顺序偏差,将这一模型作为统计推论问题。我们提议了近线性时间,为这项任务进行在线算法,并用两种卡星级分级指标证明保证。我们还证明我们的算法在理论上是最佳的信息,在两种指标中都设置了相匹配的较低界限。最后,我们证明我们的算法超出了使用所报告分数所引排名的实际方法。