Crowdsourcing is a popular paradigm for soliciting forecasts on future events. As people may have different forecasts, how to aggregate solicited forecasts into a single accurate prediction remains to be an important challenge, especially when no historical accuracy information is available for identifying experts. In this paper, we borrow ideas from the peer prediction literature and assess the prediction accuracy of participants using solely the collected forecasts. This approach leverages the correlations among peer reports to cross-validate each participant's forecasts and allows us to assign a "peer assessment score (PAS)" for each agent as a proxy for the agent's prediction accuracy. We identify several empirically effective methods to generate PAS and propose an aggregation framework that uses PAS to identify experts and to boost existing aggregators' prediction accuracy. We evaluate our methods over 14 real-world datasets and show that i) PAS generated from peer prediction methods can approximately reflect the prediction accuracy of agents, and ii) our aggregation framework demonstrates consistent and significant improvement in the prediction accuracy over existing aggregators for both binary and multi-choice questions under three popular accuracy measures: Brier score (mean square error), log score (cross-entropy loss) and AUC-ROC.
翻译:由于人们可能有不同的预测,如何将索取的预测汇总成单一准确的预测仍是一项重大挑战,特别是当没有历史准确性信息可供鉴定专家时。在本文中,我们借用同行预测文献中的想法,并评估仅使用所收集的预测的参与者的预测准确性。这种方法利用同行报告之间的相互关系来交叉校验每个参与者的预测,并使我们能够为每个代理商指定一个“同行评估分数(PAS)”作为该代理商预测准确性的代理。我们确定了若干有效的实证方法,以生成考绩制度,并提议一个汇总框架,利用考绩制度确定专家,提高现有的聚合者的预测准确性。我们评估了14多个真实世界数据集的方法,并表明一)同行预测方法产生的考绩制度可以大致反映代理人的预测准确性,二)我们的汇总框架显示,在三种大众精确度措施下,对二分数和多选题的现有聚合器的预测准确性预测准确性都得到了一致和显著的改进:Brier评分(平均误差)、log-Cropy损失(CU)和Acrosty-CU。