Survival regression is used to estimate the relation between time-to-event and feature variables, and is important in application domains such as medicine, marketing, risk management and sales management. Nonlinear tree based machine learning algorithms as implemented in libraries such as XGBoost, scikit-learn, LightGBM, and CatBoost are often more accurate in practice than linear models. However, existing state-of-the-art implementations of tree-based models have offered limited support for survival regression. In this work, we implement loss functions for learning accelerated failure time (AFT) models in XGBoost, to increase the support for survival modeling for different kinds of label censoring. We demonstrate with real and simulated experiments the effectiveness of AFT in XGBoost with respect to a number of baselines, in two respects: generalization performance and training speed. Furthermore, we take advantage of the support for NVIDIA GPUs in XGBoost to achieve substantial speedup over multi-core CPUs. To our knowledge, our work is the first implementation of AFT that utilizes the processing power of NVIDIA GPUs. Starting from the 1.2.0 release, the XGBoost package natively supports the AFT model. The addition of AFT in XGBoost has had significant impact in the open source community, and a few statistics packages now utilize the XGBoost AFT model.
翻译:生存回归用于估计时间对活动与特征变量之间的关系,在医学、营销、风险管理和销售管理等应用领域非常重要。在XGBost等图书馆实施的基于非线性树的机器学习算法,在XGBost、scikit-learn、LightGBM和CatBoost等图书馆中实施的非线性树基机学习算法,在实践中往往比线性模型更准确。然而,在XGBost现有最先进的植树模型的实施为生存回归提供了有限的支持。在这项工作中,我们在XGBoost为学习加速失败时间模型(AFT)实施损失函数,以增加对不同类型标签审查的求存模型的支持。我们通过真实和模拟的实验展示了XGBest的AFT在一系列基线方面的有效性:一般性业绩和培训速度。此外,我们利用XGBA GFPO的G模型的支持,在 XGB的软件中首次应用了UFT的软件组合组合,在A 1.20GPO中支持了X版本。