Random forests are a sensible non-parametric model to predict competing risk data according to some covariates. However, there are currently no packages that can adequately handle large datasets ($n > 100,000$). We introduce a new R package, largeRCRF, using the random competing risks forest theory developed by Ishwaran et al. (2014). We verify our package's validity and accuracy through simulation studies and show that its results are similar enough to randomForestSRC while taking less time to run. We also demonstrate the package on a large dataset that was previously inaccessible, using hardware requirements that are available to most researchers.
翻译:随机森林是一种明智的、非参数模型,可以根据某些共差来预测相互竞争的风险数据。 但是,目前没有能够充分处理大型数据集的包件。 我们采用Ishwaran等人(2014年)开发的随机相互竞争的风险森林理论,引入了一个新的R包件,即大型RCRRF。我们通过模拟研究来验证我们的包件的有效性和准确性,并表明其结果与随机ForestSRC相当,但运行时间较少。我们还利用大多数研究人员可用的硬件要求,在以前无法进入的大型数据集上展示了包件。