We propose measurement integrity, a property related to ex post reward fairness, as a novel desideratum for peer prediction mechanisms in many natural applications. Like robustness against strategic reporting, the property that has been the primary focus of the peer prediction literature, measurement integrity is an important consideration for understanding the practical performance of peer prediction mechanisms. We perform computational experiments, both with an agent-based model and with real data, to empirically evaluate peer prediction mechanisms according to both of these important properties. Our evaluations simulate the application of peer prediction mechanisms to peer assessment -- a setting in which ex post fairness concerns are particularly salient. We find that peer prediction mechanisms, as proposed in the literature, largely fail to demonstrate significant measurement integrity in our experiments. We also find that theoretical properties concerning robustness against strategic reporting are somewhat noisy predictors of empirical performance. Further, there is an apparent trade-off between our two dimensions of analysis. The best-performing mechanisms in terms of measurement integrity are highly susceptible to strategic reporting. Ultimately, however, we show that supplementing mechanisms with realistic parametric statistical models can, in some cases, improve performance along both dimensions of our analysis and result in mechanisms that strike the best balance between them.
翻译：我们提议衡量完整性,这是一种与事后奖励公平有关的财产,作为许多自然应用的同侪预测机制的一种新颖的例外; 与战略报告一样,衡量完整性是同侪预测文献的主要重点,也是了解同侪预测机制实际业绩的一个重要考虑因素; 我们用一个代理模型和真实数据进行计算实验,根据这两个重要属性对同侪预测机制进行经验性评价; 我们的评价模拟了将同侪预测机制应用于同侪评估 -- -- 一个事后关切特别突出的环境。 我们发现文献中提议的同侪预测机制基本上未能显示我们实验中重大的衡量完整性。 我们还发现,同侪预测机制对同侪预测的理论性质是实证业绩的某种杂乱的预测。 此外,我们两个分析层面之间显然存在一种权衡,衡量完整性方面的最佳机制极易受到战略报告的影响。然而,我们最终表明,在某些情况下,用现实的对等数统计模型补充机制可以改进我们分析的各方面业绩,并在实现最佳平衡的机制中取得最佳平衡。