In cluster-randomized trials, missing data can occur in various ways, including missing values in outcomes and baseline covariates at the individual or cluster level, or completely missing information for non-participants. Among the various types of missing data in CRTs, missing outcomes have attracted the most attention. However, no existing method comprehensively addresses all the aforementioned types of missing data simultaneously due to their complexity. This gap in methodology may lead to confusion and potential pitfalls in the analysis of CRTs. In this article, we propose a doubly-robust estimator for a variety of estimands that simultaneously handles missing outcomes under a missing-at-random assumption, missing covariates with the missing-indicator method (with no constraint on missing covariate distributions), and missing cluster-population sizes via a uniform sampling framework. Furthermore, we provide three approaches to improve precision by choosing the optimal weights for intracluster correlation, leveraging machine learning, and modeling the propensity score for treatment assignment. To evaluate the impact of violated missing data assumptions, we additionally propose a sensitivity analysis that measures when missing data alter the conclusion of treatment effect estimation. Simulation studies and data applications both show that our proposed method is valid and superior to the existing methods.
翻译:暂无翻译