A new tool, riAFT-BART, was recently developed to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this tool goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. We propose a permutation based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R package $\textsf{riAFTBART}$.
翻译:最近开发了一个新的工具,即riAFT-BART,目的是从集束和受审查的存活数据中得出关于人口治疗对患者生存的影响的因果推断,同时核算多层次的数据结构。该工具的实际效用超出了对人口平均治疗效应的估计。在这项工作中,我们展示了如何使用riAFT-BART来解决两个重要的统计问题,并使用集束生存数据来解决两个重要的统计问题:估算治疗效应异质性和变量选择。利用基于可能性的机器学习,我们描述一种方法,从RIAFT-BART模型运行中提取个人生存治疗效应的事后样本,并使用所抽取的远端样本进行探索性治疗效果分析。我们提出了一种基于变异性的方法,使用RIAFT-BART模型提供的预测和变异性包容比例进行变量选择。为了解决健康数据库中经常遇到的缺失的数据趋势,我们提出了一种战略,将精细的基质内基质诊断法和模拟性DAFT-BART模型的模拟工作模式进行更好的数据选择。我们提出了一种不完全性数据分析方法,我们目前研究的基数的基数的基数研究提供了一种数据分析方法。