Causal inference from longitudinal studies is central to epidemiologic research. Targeted Maximum Likelihood Estimation (TMLE) is an established double-robust causal effect estimation method, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on motivating data from the Victorian Adolescent Health Cohort Study, we conducted simulation and case studies to evaluate the performance of methods for handling missing data when using TMLE. These were complete-case analysis; an extended TMLE method incorporating a model for outcome missingness mechanism; missing indicator method for missing covariate data; and six multiple imputation (MI) approaches using parametric or machine-learning approaches to handle missing outcome, exposure, and covariate data. The simulation study considered a simple scenario (the exposure and outcome generated from main-effects regressions), and two complex scenarios (models also included interactions), alongside eleven missingness mechanisms defined using causal diagrams. No approach performed well across all scenarios and missingness mechanisms. For non-MI methods, bias depended on missingness mechanism (little when outcome did not influence missingness in any variable). For parametric MI, bias depended on missingness mechanism (smaller when outcome did not directly influence outcome missingness) and data generation scenario (larger for the complex scenarios). Including interaction terms in the imputation model improved performance. For MI using machine learning, bias depended on missingness mechanism (smaller when no variable with missing data directly influenced outcome missingness). We recommend considering missing data mechanism and, if using MI, opting for a saturated parametric or data-adaptive imputation model for handling missing data in TMLE estimation.
翻译:长途研究得出的因果关系推断是流行病学研究的核心。 目标最大 Lililibear Estimation (TMLE) 是一个固定的双粗粗粗因果估计方法, 但使用数据适应方法使用TMLEL时, 如何处理缺失的数据。 根据维多利亚青少年健康科研究的激励数据, 我们进行了模拟和案例研究, 以评估使用 TMLE 处理缺失数据的方法的绩效。 它们是完整的分析; 包含结果缺失机制模型的扩大TMLE 方法; 缺失的 Coevariate 数据缺少指标方法; 以及六种多位变差估算法(MI), 使用分解法或机器学习方法处理缺失的结果、 暴露和变差数据。 模拟研究考虑了一种简单的假设( 主效应回归的暴露和结果), 以及两种复杂的假设( 模型还包括互动), 以及11个以因使用因错失因果关系图表而定义的缺失模型。 没有在各种假设和数据处理机制中执行过任何的。 在非MI 方法中, 误判机制( 误判 ), 误判 误判 误判, 误判 误判, 误判, 误判,, 误判 误判 误判, 误判 误判 误判, 不判, 误判 误判 误判, ( 当IM判 误判, 不判,, 失判 失判 。 ( 当失判,,,,, 失判 失判, 不判 失判 失判 失判,,, 失判,, 失判,, 失判, 失判, 不判 失判 失判 失判 失判 失判,, 不判 。 ( 当 。 ( 当,,,,, 失判 失判, 不判, 失判, 失判 失判 失判 失判,,,, 不判,, 不判 失判 失判 失判, 不判, 失判 失判, 不判 失判 失判