We propose a variational Bayesian proportional hazards model for prediction and variable selection regarding high-dimensional survival data. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining the useful features, providing excellent point estimates and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, where we identify genes with pre-existing biological interpretations.
翻译:我们提出一个可变的贝叶斯比例危害模型,用于预测和选择高维生存数据。我们的方法基于平均场变差近似值,克服了MCMC的高计算成本,同时保留了有用的特征,提供了极好的点估计,并为通过后世包容概率进行变量选择提供了自然机制。我们拟议方法的性能是通过广泛的模拟来评估的,并与其他最先进的巴伊斯变量选择方法进行比较,以证明可比较或更好的性能。最后,我们展示了如何将拟议方法用于在两个具有受审查的生存结果的转录组数据集中进行变量选择,我们在那里确定了具有先前生物解释的基因。