Usual parametric and semi-parametric regression methods are inappropriate and inadequate for large clustered survival studies when the appropriate functional forms of the covariates and their interactions in hazard functions are unknown, and random cluster effects and cluster-level covariates are spatially correlated. We present a general nonparametric method for such studies under the Bayesian ensemble learning paradigm called Soft Bayesian Additive Regression Trees. Our methodological and computational challenges include large number of clusters, variable cluster sizes, and proper statistical augmentation of the unobservable cluster-level covariate using a data registry different from the main survival study. We use an innovative 3-step approach based on latent variables to address our computational challenges. We illustrate our method and its advantages over existing methods by assessing the impacts of intervention in some county-level and patient-level covariates to mitigate existing racial disparity in breast cancer survival in 67 Florida counties (clusters) using two different data resources. Florida Cancer Registry (FCR) is used to obtain clustered survival data with patient-level covariates, and the Behavioral Risk Factor Surveillance Survey (BRFSS) is used to obtain further data information on an unobservable county-level covariate of Screening Mammography Utilization (SMU).
翻译:暂无翻译