Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense. We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.
翻译:分析高维稀有生存数据的少数贝叶斯方法提供了可缩放的变量选择、效果估计和不确定性量化方法,这些方法往往要么通过计算后继估计数来牺牲不确定性的量化,要么以高(不可缩放的)计算费用来量化不确定性。 我们弥合了这一差距,并开发了一个可解释和可缩放的贝叶斯比例危害模型,用于预测和可变选择,称为SVB。 我们的方法以平均场变差近似值为基础,克服了MCMC的高昂计算成本,同时保留了有用的特征,为参数提供了后继分布,并为通过事后包容概率选择变量提供了自然机制。我们拟议方法的性能是通过广泛的模拟来评估的,与其他最先进的Bayesian变量选择方法进行比较,以显示可比较或更好的性能。最后,我们展示了如何在具有审查生存结果的两套超文本组数据集中采用拟议方法进行变量选择,以及如何利用我们方法提供的不确定性定量来对病人风险进行可解释的评估。