Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval-censored survival data under a paradigm of Bayesian ensemble learning, called Soft Bayesian Additive Regression Trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left-censored, right-censored, and interval-censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our method's applicability in studies involving high dimensional data with complex underlying associations.
翻译:当不同的共变和组群的未知效应复杂时,群集生存数据的人口偏差和半偏差危害回归模型是不适当和不充分的,因为不同的共变和组群的未知效果是复杂的,这就要求有一个灵活的模型框架,以便产生有效的生存预测;此外,对于一些涉及发生一些无症状事件的时间的生存研究来说,在连续的临床检查之间,通常对生存时间进行间隔检查;在本篇文章中,我们提出了一个强大的半参数模型,用于群集的间歇性生存数据,这种模型是巴伊西亚混合学习的范例,称为Soft Bayesian Additive Regrestition树或SBART(Lero和Yang,2018年),这种模型将多种稀少(软)决策树结合起来,以便获得很好的生存预测性预测;我们开发了一个新的半参数回归模型,将危险功能作为参数基线危险功能的产物进行模型,而一个非参数组件,利用SBARTAT将主要效应的未知功能形式和各种变异效应的相互作用效果纳入。除了适用于左比、右层、深层和隔间研究(L-CL-CRE)联系的基研究之外,我们使用了一种通过高基体研究的方法,我们现有的生存数据分析方法,从而可以使用一种分析方法来计算方法,用以分析。