This article considers ultrahigh-dimensional forecasting problems with survival response variables. We propose a two-step model averaging procedure for improving the forecasting accuracy of the true conditional mean of a survival response variable. The first step is to construct a class of candidate models, each with low-dimensional covariates. For this, a feature screening procedure is developed to separate the active and inactive predictors through a marginal BuckleyCJames index, and to group covariates with a similar index size together to form regression models with survival response variables. The proposed screening method can select active predictors under covariate-dependent censoring, and enjoys sure screening consistency under mild regularity conditions. The second step is to find the optimal model weights for averaging by adapting a delete-one cross-validation criterion, without the standard constraint that the weights sum to one. The theoretical results show that the delete-one cross-validation criterion achieves the lowest possible forecasting loss asymptotically. Numerical studies demonstrate the superior performance of the proposed variable screening and model averaging procedures over existing methods.
翻译:本条考虑了生存响应变量的超高维预测问题。 我们提出一个两步平均模型, 用于提高生存响应变量真正有条件平均值的预测准确性。 第一步是构建一组候选模型, 每种模型都有低维共变数。 为此, 开发了一个特性筛选程序, 通过边际的 BuckleyCJames 指数将活跃和不活跃的预测器分离出来, 并将相似的指数大小组合在一起, 形成具有生存响应变量的回归模型。 拟议的筛选方法可以在共变独立审查中选择主动预测器, 并在温和的常规条件下保持筛选一致性。 第二步是找到最佳模型加权, 通过调整删除单的交叉校验标准来平均计算。 理论结果显示, 删除一的交叉校验标准可以同时得出最低可能的预测损失。 数值研究显示, 拟议的变量筛选和模型平均程序优于现有方法。