A prevalent feature of high-dimensional data is the dependence among covariates, and model selection is known to be challenging when covariates are highly correlated. To perform model selection for the high-dimensional Cox proportional hazards model in presence of correlated covariates with factor structure, we propose a new model, Factor-Augmented Regularized Model for Hazard Regression (FarmHazard), which builds upon latent factors that drive covariate dependence and extends Cox's model. This new model generates procedures that operate in two steps by learning factors and idiosyncratic components from high-dimensional covariate vectors and then using them as new predictors. Cox's model is a widely used semi-parametric model for survival analysis, where censored data and time-dependent covariates bring additional technical challenges. We prove model selection consistency and estimation consistency under mild conditions. We also develop a factor-augmented variable screening procedure to deal with strong correlations in ultra-high dimensional problems. Extensive simulations and real data experiments demonstrate that our procedures enjoy good performance and achieve better results on model selection, out-of-sample C-index and screening than alternative methods.
翻译:高维数据的一个普遍特征是共变体之间的依赖性,而当共变体高度相关时,模型选择就具有挑战性。为了在与要素结构相关联的共变体面前对高维考克斯比例危害模型进行模型选择,我们提议了一个新的模型,即 " 危险递减因子强化正规模型 " (FarmHazard),该模型以诱发共变依赖性并扩展Cox模式的潜在因素为基础。这一新模型产生程序,通过学习因素和高维共变体矢量的特异性合成构件分两步运作,然后作为新的预测器使用。Cox的模型是一种广泛使用的半参数模型,用于生存分析,在这种模型中,受审查的数据和时间依赖的共变异性带来了额外的技术挑战。我们证明,在温和条件下,选择模式选择的一致性和估计一致性是典型的。我们还开发了一个因子强化的可变筛选程序,以处理超高维度问题中的强烈关联性。广泛的模拟和真实数据实验表明,我们的程序表现良好,在模型选择、超常态的C指数和筛选方法上取得了更好的结果。