Survival random forest is a popular machine learning tool for modeling censored survival data. However, there is currently no statistically valid and computationally feasible approach for estimating its confidence band. This paper proposes an unbiased confidence band estimation by extending recent developments in infinite-order incomplete U-statistics. The idea is to estimate the variance-covariance matrix of the cumulative hazard function prediction on a grid of time points. We then generate the confidence band by viewing the cumulative hazard function estimation as a Gaussian process whose distribution can be approximated through simulation. This approach is computationally easy to implement when the subsampling size of a tree is no larger than half of the total training sample size. Numerical studies show that our proposed method accurately estimates the confidence band and achieves desired coverage rate. We apply this method to veterans' administration lung cancer data.
翻译:生存随机森林是模拟受审查的生存数据的一个受欢迎的机器学习工具。 但是,目前没有统计上有效和计算上可行的方法来估计其信任带。 本文建议通过扩大无限级不完整的U- 统计学的最新发展来进行无偏倚的信任带估计。 其想法是估计在时间点网格上累积危险函数预测的差异差变矩阵。 然后我们通过将累积危险函数估计值视为一个通过模拟可以大致分布的高斯进程来生成信任带。 当树的次抽样面积不超过培训抽样总数的一半时, 这种方法很容易计算出来。 数字研究显示, 我们提出的方法准确估计了信任带并达到了预期的覆盖率。 我们用这种方法来计算退伍军人管理肺癌的数据。