Background: Missing data is a pervasive problem in epidemiology, with complete records analyses (CRA) or multiple imputation (MI) the most common methods to deal with incomplete data. MI is valid when incomplete variables are independent of response indicators, conditional on complete variables - however, this can be hard to assess with multiple incomplete variables. Previous literature has shown that MI may be valid in subsamples of the data, even if not necessarily valid in the full dataset. Current guidance on how to decide whether MI is appropriate is lacking. Methods: We develop an algorithm that is sufficient to indicate when MI will estimate an exposure-outcome coefficient without bias and show how to implement this using directed acyclic graphs (DAGs). We extend the algorithm to investigate whether MI applied to a subsample of the data, in which some variables and complete and the remaining are imputed, will be unbiased for the same estimand. We demonstrate the algorithm by applying it to several simple examples and a more complex real-life example. Conclusions: Multiple incomplete variables are common in practice. Assessing the plausibility of each of CRA and MI estimating an exposure-outcome association without bias is crucial in analysing and interpreting results. Our algorithm provides researchers with the tools to decide whether (and how) to use MI in practice. Further work could focus on the likely size and direction of biases, and the impact of different missing data patterns.
翻译:暂无翻译