In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on validation. We explore how leave-one-out cross-validation (LOO) can be used to compare Bayesian models for MRP. We investigate two approximate calculations of LOO, the Pareto smoothed importance sampling (PSIS-LOO) and a survey-weighted alternative (WTD-PSIS-LOO). Using two simulation designs, we examine how accurately these two criteria recover the correct ordering of model goodness at predicting population and small area level estimands. Focusing first on variable selection, we find that neither PSIS-LOO nor WTD-PSIS-LOO correctly recovers the models' order for an MRP population estimand (although both criteria correctly identify the best and worst model). When considering small-area estimation, the best model differs for different small areas, highlighting the complexity of MRP validation. When considering different priors, the models' order seems slightly better at smaller area levels. These findings suggest that while not terrible, PSIS-LOO-based ranking techniques may not be suitable to evaluate MRP as a method. We suggest this is due to the aggregation stage of MRP, where individual-level prediction errors average out. These results show that in practice, PSIS-LOO-based model validation tools need to be used with caution and might not convey the full story when validating MRP as a method.
翻译:近几十年来,多层次的回归和后处理(MRP)在人口推断方面已大为流行。然而,估算的有效性取决于模型的细节,而目前对验证的研究很少。我们探索如何使用“一出一出”交叉验证(LOO)来比较Bayesian模型,我们调查了LO的两种近似计算方法,即Pareto平滑重要性抽样(PSIS-LOO)和调查加权替代方法(WTD-PISIS-LOO)。使用两种模拟设计,我们检查这两项标准在预测人口和小面积估计中恢复了正确的模型质量。我们首先关注变量的选择,我们发现PSIS-LO和WTD-PSIS-LO都没有正确恢复模型对MRP人口估计值的排序(尽管两者的标准都正确地确定了最佳和最坏的模型 ) 。在考虑小范围的估算时,最佳模型对不同的小地区有差异,突出了MRP的校准的复杂性。在考虑不同的前一级,模型中似乎将PRP-L的排序比更适合的MIS等级方法,而我们的排序则认为MIS-L的排序比为最低的排序比为更小。