会议同行评审的系统层面分析 (A System-Level Analysis of Conference Peer Review)

The conference peer review process involves three constituencies with different objectives: authors want their papers accepted at prestigious venues (and quickly), conferences want to present a program with many high-quality and few low-quality papers, and reviewers want to avoid being overburdened by reviews. These objectives are far from aligned, primarily because the evaluation of a submission is inherently noisy. Over the years, conferences have experimented with numerous policies to navigate the tradeoffs. These experiments include setting various bars for acceptance, varying the number of reviews per submission, requiring prior reviews to be included with resubmissions, and others. In this work, we investigate, both analytically and empirically, how well various policies work, and more importantly, why they do or do not work. We model the conference-author interactions as a Stackelberg game in which a prestigious conference commits to an acceptance policy; the authors best-respond by (re)submitting or not (re)submitting to the conference in each round of review, the alternative being a "sure accept" (such as a lightly refereed venue). Our main results include the following observations: 1) the conference should typically set a higher acceptance threshold than the actual desired quality; we call this the "resubmission gap". 2) the reviewing load is heavily driven by resubmissions of borderline papers - therefore, a judicious choice of acceptance threshold may lead to fewer reviews while incurring an acceptable loss in conference quality. 3) conference prestige, reviewer inaccuracy, and author patience increase the resubmission gap, and thus increase the review load for a fixed level of conference quality. For robustness, we further consider different models of paper quality and compare our theoretical results to simulations based on plausible parameters estimated from real data.

翻译：会议同行评审过程涉及三个不同目标的参与者：作者希望他们的论文被高声誉的会议接收（并快速接收），会议希望展现许多高质量论文和很少低质量论文的节目，而评审人则希望避免过多的评审负担。这些目标远非协调的，主要因为提交的评估本质上是噪声的。多年来，会议采用了许多政策来处理这些权衡。这些实验包括设定各种接受门槛、变化每份投稿的评审数量、要求包括先前的评审在重新提交时一起提交等。在这项工作中，我们从分析和实证方面，研究了各种政策的工作效果以及更重要的是，它们为什么有效或无效。我们将会议-作者交互建模为一个斯塔克伯格博弈，其中一场声誉高的会议承诺一项接受政策；作者在每轮评审中会做出最佳反应，重新提交或不重新提交到会议，另一种选择则是“确定接受”（如轻度仲裁的地点）。我们的主要结果包括以下观察：1）会议通常应该设置一个比实际所需质量更高的接受门槛；我们将其称为“重新提交差距”。2）评审负载受边界论文的重新提交的影响很大，因此，审慎选择接受门槛可能会导致较少的审查，同时承担可接受的会议质量损失。3）会议声誉、评估者的不准确性和作者的耐心增加了重新提交差距，从而增加了固定水平会议质量下的评审负载。为了稳健性，我们进一步考虑了不同的论文质量模型，并将我们的理论结果与基于真实数据估计的合理参数的模拟进行比较。