Censored, missing, and error-prone covariates are all coarsened data types for which the true values are unknown. Many methods to handle the unobserved values, including imputation, are shared between these data types, with nuances based on the mechanism dominating the unobserved values and any other available information. For example, in prospective studies, the time to a specific disease diagnosis will be incompletely observed if only some patients are diagnosed by the end of the follow-up. Specifically, some times will be randomly right-censored, and patients' disease-free follow-up times must be incorporated into their imputed values. Assuming noninformative censoring, these censored values are replaced with their conditional means, the calculations of which require (i) estimating the conditional distribution of the censored covariate and (ii) integrating over the corresponding survival function. Semiparametric approaches are common, which estimate the distribution with a Cox proportional hazards model and then the integral with the trapezoidal rule. While these approaches offer robustness, they come at the cost of statistical and computational efficiency. We propose a general framework for parametric conditional mean imputation of censored covariates that offers better statistical precision and requires less computational strain by modeling the survival function parametrically, where conditional means often have an analytic solution. The framework is implemented in the open-source R package, speedyCMI.
翻译:暂无翻译