The output of predictive models is routinely recalibrated by reconciling low-level predictions with known derived quantities defined at higher levels of aggregation. For example, models predicting turnout probabilities at the individual level in U.S. elections can be adjusted so that their aggregation matches the observed vote totals in each state, thus producing better calibrated predictions. In this research note, we provide theoretical grounding for one of the most commonly used recalibration strategies, known colloquially as the "logit shift." Typically cast as a heuristic optimization problem (whereby an adjustment is found such that it minimizes the difference between aggregated predictions and the target totals), we show that the logit shift in fact offers a fast and accurate approximation to a principled, but often computationally impractical adjustment strategy: computing the posterior prediction probabilities, conditional on the target totals. After deriving analytical bounds on the quality of the approximation, we illustrate the accuracy of the approach using Monte Carlo simulations. The simulations also confirm analytical results regarding scenarios in which users of the simple logit shift can expect it to perform best -- namely, when the aggregated targets are comprised of many individual predictions, and when the distribution of true probabilities is symmetric and tight around 0.5.
翻译:预测模型的输出通常通过调和低水平预测和在较高汇总水平上界定的已知测算数量来重新校准。例如,可以调整预测美国选举个人一级投票率概率的模型,使其总数与每个州观察到的投票总数相匹配,从而产生更好的校准预测。在本研究说明中,我们为最常用的校准战略之一提供了理论依据,即所谓的“流动转移”。通常作为一个超常优化问题(通过调整,发现它最大限度地缩小了汇总预测和目标总数之间的差别),我们表明,对日志的改变事实上提供了一种快速和准确的近似原则性,但往往在计算上不切实际的调整战略:根据目标总数计算远端预测概率。在对近似质量进行分析后,我们用蒙特卡洛模拟来说明方法的准确性。模拟还证实了有关假设的分析性结果,在这些假设中,简单日志转移的用户可以预计它能够达到最佳的准确度 -- -- 即,当对单个目标的精确度和精确性进行预测时,当对准时,当对单个的预测和精确性进行时,当每个目标的精确性进行估算和精确性进行时,则将围绕着一个精确性进行。