In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50\% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for \emph{accepted} papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of `top-tier conference publications' when assessing the quality of individual researchers. For NeurIPS 2021, the PCs are repeating the experiment, as well as conducting new ones.
翻译:在这份文件中,我们重新审视了2014年NeurIPS的实验,该实验审查了会议同侪审查中的不一致之处。我们确定,审查者质量评分差异的50个百分点是主观的。此外,自我们发现实验以来已经过去了7年,我们发现,对于\ emph{ 接受的论文,质量评分与作为引文计数的论文的影响之间没有任何关联。我们追踪了被拒绝的论文的去向,恢复了这些论文最终出版的地点。对于这些论文,我们发现了质量评分与影响之间的关联。我们得出结论,2014年会议的审评进程对于识别较差的论文是好的,但对于识别好论文却很差。我们提出了一些改进审查程序的建议,但也警告不要删除主观要素。最后,我们建议,实验的真正结论是,在评估个别研究人员的质量时,社区应该少把“最高级会议出版物”的概念放在次要位置上。对于NeurIPS 2021, PC正在重复试验,以及进行新的实验。