加速和可解释的斜斜随机生存森林 (Accelerated and interpretable oblique random survival forests)

The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors to create branches, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles often have higher prediction accuracy than standard RSF ensembles. However, assessing all possible linear combinations of predictors induces significant computational overhead that limits applications to large-scale data sets. In addition, few methods have been developed for interpretation of oblique RSF ensembles, and they remain more difficult to interpret compared to their axis-based counterparts. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate importance of individual predictor variables with the oblique RSF. Our strategy to reduce computational overhead makes use of Newton-Raphson scoring, a classical optimization technique that we apply to the Cox partial likelihood function within each non-leaf node of decision trees. We estimate the importance of individual predictors for the oblique RSF by negating each coefficient used for the given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In general benchmarking experiments, we find that our implementation of the oblique RSF is approximately 450 times faster with equivalent discrimination and superior Brier score compared to existing software for oblique RSFs. We find in simulation studies that 'negation importance' discriminates between relevant and irrelevant predictors more reliably than permutation importance, Shapley additive explanations, and a previously introduced technique to measure variable importance with oblique RSFs based on analysis of variance. Methods introduced in the current study are available in the aorsf R package.

翻译：斜体随机生存森林(RSF)是一种全方位的全方位监督学习方法。斜体RSF的树木是用预测器的线性组合来创建分支, 而标准RSF则使用单一预测器。 Oblique RSF 集合的预测准确度往往高于标准RSF 集合。但是, 评估预测器的所有可能的线性组合都会导致巨大的计算间接费用, 从而限制对大型数据集的应用。此外, 也很少开发出解释隐性 RSF 组合的不等值方法, 而且它们仍然很难用预测器的线性组合来创建分支。我们引入了一种方法来提高斜体值 RSF 的计算效率, 而一种方法来估计个人预测变量变量的重要性。我们的减少计算策略是使用牛顿- Raphson 的评分, 一种我们适用于每个非利心型的软体奥部分概率函数。此外,我们估算了对硬体的不等值的当前货币变值分析的重要性, 而对于硬体的货币变值分析则在以往的变值分析中, 变值分析中,我们使用的变数的变数的变数的变数的变数的变数在计算中找到了我们所使用的变数的变数的变数的变数的变数的变数的变数的变数的变数的变数。