Machine learning (ML) and artificial intelligence (AI) conferences including NeurIPS and ICML have experienced a significant decline in peer review quality in recent years. To address this growing challenge, we introduce the Isotonic Mechanism, a computationally efficient approach to enhancing the accuracy of noisy review scores by incorporating authors' private assessments of their submissions. Under this mechanism, authors with multiple submissions are required to rank their papers in descending order of perceived quality. Subsequently, the raw review scores are calibrated based on this ranking to produce adjusted scores. We prove that authors are incentivized to truthfully report their rankings because doing so maximizes their expected utility, modeled as an additive convex function over the adjusted scores. Moreover, the adjusted scores are shown to be more accurate than the raw scores, with improvements being particularly significant when the noise level is high and the author has many submissions -- a scenario increasingly prevalent at large-scale ML/AI conferences. We further investigate whether submission quality information beyond a simple ranking can be truthfully elicited from authors. We establish that a necessary condition for truthful elicitation is that the mechanism be based on pairwise comparisons of the author's submissions. This result underscores the optimality of the Isotonic Mechanism, as it elicits the most fine-grained truthful information among all mechanisms we consider. We then present several extensions, including a demonstration that the mechanism maintains truthfulness even when authors have only partial rather than complete information about their submission quality. Finally, we discuss future research directions, focusing on the practical implementation of the mechanism and the further development of a theoretical framework inspired by our mechanism.
翻译:近年来,包括NeurIPS和ICML在内的机器学习(ML)和人工智能(AI)会议经历了同行评审质量的显著下降。为应对这一日益严峻的挑战,我们提出了等渗机制,这是一种计算高效的方法,通过整合作者对其投稿的私有评估来提升嘈杂评审分数的准确性。在该机制下,拥有多篇投稿的作者需按其感知质量降序排列论文。随后,原始评审分数将基于此排序进行校准,以生成调整后的分数。我们证明,作者有动机如实报告其排序,因为这样做能最大化其期望效用(建模为调整后分数上的加性凸函数)。此外,调整后的分数被证明比原始分数更准确,当噪声水平较高且作者投稿数量众多时——这一场景在大型ML/AI会议中日益普遍——改进尤为显著。我们进一步探讨了是否可以从作者处真实获取超出简单排序的投稿质量信息。我们确立了一个必要条件:任何真实获取信息的机制必须基于对作者投稿的成对比较。这一结果凸显了等渗机制的最优性,因为它在我们考虑的所有机制中获取了最细粒度的真实信息。随后,我们提出了若干扩展,包括证明即使作者仅拥有关于其投稿质量的部分而非完整信息,该机制仍能保持真实性。最后,我们讨论了未来的研究方向,重点关注该机制的实际实施以及受此机制启发的理论框架的进一步发展。