Background: Multiple imputation is often used to reduce bias and gain efficiency when there is missing data. The most appropriate imputation method depends on the model the analyst is interested in fitting. Several imputation approaches have been proposed for when this model is a logistic regression model with an interaction term that contains a binary partially observed variable; however, it is not clear which performs best under certain parameter settings. Methods: Using 1000 simulations, each with 10,000 observations, under six data-generating mechanisms (DGM), we investigate the performance of four methods: (i) 'passive imputation', (ii) 'just another variable' (JAV), (iii) 'stratify-impute-append' (SIA), and (iv) 'substantive model compatible fully conditional specifica-tion' (SMCFCS). The application of each method is shown in an empirical example using England-based cancer registry data. Results: SMCFCS and SIA showed the least biased estimate of the coefficients for the fully, and partially, observed variable and the interaction term. SMCFCS and SIA showed good coverage and low relative error for all DGMs. SMCFCS had a large bias when there was a low prevalence of the fully observed variable in the interaction. SIA performed poorly when the fully observed variable in the interaction had a continuous underlying form. Conclusion: SMCFCS and SIA give consistent estimation for logistic regression models with an interaction term when data are missing at random, and either can be used in most analyses. SMCFCS performed better than SIA when the fully observed variable in the interaction had an underlying continuous form. Researchers should be cautious when using SMCFCS when there is a low prevalence of the fully observed variable in the interaction.
翻译:多重估算通常用于减少偏差,并在缺少数据时提高效率。最合适的估算方法取决于分析者感兴趣的设计模型。当模型是一个包含部分观测到的二进制变量的逻辑回归模型时,提出了几种估算方法;然而,在某些参数设置下,还不清楚哪个模型最能在某些参数设置下发挥最佳作用。方法:在六个数据生成机制(DGM)下,我们使用1,000个模拟,每个模型有10,000个观测,我们调查四种方法的性能:(一) 随机互换,(二) “唯一另一个变量 ” (JAAV),(三) “限制-简化应用” (SIA),以及(四) “基建模型完全符合条件特定变量变量变量的回归模型 ” (SMCCS) 。使用基于英格兰的癌症登记处数据,每个方法的应用在实验实例中显示。结果:SMCCS和SI在全部观察到的估算结果时,对全部、部分观察到的变量和互动术语的系数的估算最偏差。SBOS和SIA显示,在全部观测到的SGM的深度分析中,在全部观测到一个持续运行时,一个持续的快速分析时,一个持续的快速分析中,一个持续的基数分析中,一个持续的基底的模型是整个的完整。