In peer review, reviewers are usually asked to provide scores for the papers. The scores are then used by Area Chairs or Program Chairs in various ways in the decision-making process. The scores are usually elicited in a quantized form to accommodate the limited cognitive ability of humans to describe their opinions in numerical values. It has been found that the quantized scores suffer from a large number of ties, thereby leading to a significant loss of information. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are however two key challenges. First, there is no standard procedure for using this ranking information and Area Chairs may use it in different ways (including simply ignoring them), thereby leading to arbitrariness in the peer-review process. Second, there are no suitable interfaces for judicious use of this data nor methods to incorporate it in existing workflows, thereby leading to inefficiencies. We take a principled approach to integrate the ranking information into the scores. The output of our method is an updated score pertaining to each review that also incorporates the rankings. Our approach addresses the two aforementioned challenges by: (i) ensuring that rankings are incorporated into the updates scores in the same manner for all papers, thereby mitigating arbitrariness, and (ii) allowing to seamlessly use existing interfaces and workflows designed for scores. We empirically evaluate our method on synthetic datasets as well as on peer reviews from the ICLR 2017 conference, and find that it reduces the error by approximately 30% as compared to the best performing baseline on the ICLR 2017 data.
翻译:在同侪审查中,通常要求审查者为论文提供分数。然后,区域主席或方案主席在决策过程中以各种方式使用分数。评分通常以量化的形式进行,以适应人类有限的认知能力,以数字值描述自己的意见。发现量化的分数存在大量联系,从而导致信息的重大损失。为缓解这一问题,会议开始要求审查者进一步提供所审查的文件的分数。然而,有两个主要挑战。首先,没有使用这种分数的标准程序,区域主席可能以不同的方式使用分数,从而导致同侪审查进程的任意性。第二,没有适当的界面来明智地使用这些数据,也没有方法将这些数据纳入现有工作流程,从而导致效率低下。我们采取了原则性办法将信息纳入分数。我们的方法是每次审查的最新分数,同时也包括分级。我们的方法解决了上述两个挑战:(一) 比较C的分数,从排序到评估,将所有排序都纳入同级的分数,从而将我们所设计的数据纳入同级的分数中。