同行预测中的计量完整性:同行评估案例研究 (Measurement Integrity in Peer Prediction: A Peer Assessment Case Study)

from arxiv, The code for our experiments is hosted in the following GitHub repository: https://github.com/burrelln/Measurement-Integrity-and-Peer-Assessment

We propose measurement integrity, a property related to ex post reward fairness, as a novel desideratum for peer prediction mechanisms in many applications, including peer assessment. We operationalize this notion to evaluate the measurement integrity of different mechanisms in computational experiments. Our evaluations simulate the application of peer prediction mechanisms to peer assessment---a setting in which realistic models have been validated on real data and in which ex post fairness concerns are quite salient. We find that peer prediction mechanisms, as proposed in the literature, largely fail to demonstrate measurement integrity in our experiments. However, we also find that certain mechanisms can be supplemented with realistic parametric statistical models to improve their measurement integrity. In the same setting, we also evaluate an empirical notion of robustness against strategic behavior to complement the theoretical analyses of robustness against strategic behavior that have been the main focus of the peer prediction literature. In this dimension of analysis, we again find that supplementing certain mechanisms with parametric statistical models can improve their empirical performance. Even so, though, we find that theoretical guarantees of robustness against strategic behavior are somewhat noisy predictors of empirical robustness. As a whole, our empirical methodology for quantifying desirable mechanism properties facilitates a more nuanced comparison between mechanisms than theoretical analysis alone. Ultimately, we find there is a trade-off between our two dimensions of analysis. The best performing mechanisms for measurement integrity are highly susceptible to strategic behavior. On the other hand, certain parametric peer prediction mechanisms are robust against all the strategic manipulations we consider while still achieving reasonable measurement integrity.

翻译：我们建议衡量完整性,这是一种与事后奖励公平有关的财产,作为许多应用中的同行预测机制,包括同行评估的新型比对性; 我们实施这一概念,评估计算实验中不同机制的衡量完整性; 我们的评价模拟了同行评估机制的应用,以同行评估为基准,在实际数据上验证了现实模型,而且事后公平关切相当突出; 我们发现,文献中提议的同行预测机制在很大程度上未能表明我们的实验中衡量完整性; 但我们也发现,某些机制可以辅之以现实的参数统计模型,以提高其计量完整性; 在同一背景下,我们还评估了一种对战略行为采取强力的实证概念,以补充对同行评估文献中主要重点的战略行为的强力的理论分析; 在这种分析中,我们再次发现,某些机制加上偏差统计模型,可以改善其经验性; 然而,我们仍然发现,对战略行为的稳健性理论上的保证是经验稳健性的预测力。总的来说,我们对战略行为的稳健性进行了实证性评估,而对战略行为的稳健性进行了实情评估,我们仅靠一种实情评估机制,而只是进行理论性分析。