Cox proportional hazards model is one of the most popular models in biomedical data analysis. There have been continuing efforts to improve the flexibility of such models for complex signal detection, for example, via additive functions. Nevertheless, the task to extend Cox additive models to accommodate high-dimensional data is nontrivial. When estimating additive functions, commonly used group sparse regularization may introduce excess smoothing shrinkage on additive functions, damaging predictive performance. Moreover, an "all-in-all-out" approach makes functional selection challenging to answer if nonlinear effects exist. We develop an additive Cox PH model to address these challenges in high-dimensional data analysis. Notably, we impose a novel spike-and-slab LASSO prior that motivates the bi-level functional selection on additive functions. A scalable and deterministic algorithm, EM-Coordinate Descent, is designed for scalable model fitting. We compare the predictive and computational performance against state-of-the-art models in simulation studies and metabolomics data analysis. The proposed model is broadly applicable to various fields of research, e.g. genomics and population health, via the freely available R package BHAM (https://boyiguo1.github.io/BHAM/).
翻译:Cox成比例危害模型是生物医学数据分析中最受欢迎的模型之一。我们一直在不断努力提高这类模型的灵活性,以便进行复杂的信号检测,例如通过添加功能。然而,扩大Cox添加模型以容纳高维数据的任务不是三维的。当估计添加功能时,通常使用的组稀释功能可能会对添加功能造成过度的平滑缩小,损害预测性能。此外,“万灵通”方法使得在非线性效应存在时难以作出功能选择。我们开发了一个添加式Cox PH模型,以便在高维数据分析中应对这些挑战。值得注意的是,我们在前一个新颖的加压和Slab LASSOS模型,激励对添加功能进行双级功能选择。一个可缩放和确定性算法,即EM-Cofrodlegle,是为可缩放的模型设计。我们比较模拟研究和代谢数据分析中的预测和计算性能与状态-艺术模型比较。拟议的模型广泛适用于各种研究领域,例如:genomomicmic/Mumabas/BAM。(通过可自由获得的Rammm/HAM)/Rabsiobsmassmass。