Statistical models often require inputs that are not completely known. This can occur when inputs are measured with error, indirectly, or when they are predicted using another model. In environmental epidemiology, air pollution exposure is a key determinant of health, yet typically must be estimated for each observational unit by a complex model. Bayesian two-stage models combine this stage-one model with a stage-two model for the health outcome given the exposure. However, analysts usually only have access to the stage-one model output without all of its specifications or input data, making joint Bayesian inference apparently intractable. We show that two prominent workarounds-using a point estimate or using the posterior from the stage-one model without feedback from the stage-two model-lead to miscalibrated inference. Instead, we propose efficient algorithms to facilitate joint Bayesian inference and provide more accurate estimates and well-calibrated uncertainties. Comparing different approaches, we investigate the association between PM2.5 exposure and county-level mortality rates in the South-Central USA.
翻译:统计模型通常需要不完全已知的输入数据。这种情况可能发生在输入数据存在测量误差、通过间接方式获取,或需要通过另一模型进行预测时。在环境流行病学中,空气污染暴露是健康的关键决定因素,但通常需要通过复杂模型对各观测单元进行估算。贝叶斯两阶段模型将第一阶段(暴露估计)模型与第二阶段(给定暴露条件下的健康结局)模型相结合。然而,分析人员通常只能获取第一阶段模型的输出结果,而无法获得其全部技术细节或输入数据,这使得联合贝叶斯推断在表面上难以实现。我们证明两种主流替代方案——使用点估计值或采用未受第二阶段模型反馈影响的第一阶段模型后验分布——会导致推断结果失准。为此,我们提出高效算法以促进联合贝叶斯推断,从而提供更精确的估计量和校准良好的不确定性度量。通过比较不同方法,我们研究了美国中南部地区PM2.5暴露与县级死亡率之间的关联性。