IMPORTANCE: Feature selection with respect to time-to-event outcomes is one of the fundamental problems in clinical trials and biomarker discovery studies. But it's unclear which statistical methods should be used when sample size is small or some of the key covariates are not measured. DESIGN: In this simulation study, the true models are multivariate Cox proportional hazards models with 10 covariates. It's assumed that only 5 out the 10 true features are observed/measured for all model fitting, along with 5 random noise features. Each sample size scenario is explored using 10,000 simulation datasets. Eight regression models are applied to each dataset to estimate feature effects, including both regularized Gaussian regression (elastic net penalty) and regularized Cox regression (glmnet Cox). RESULTS: If the covariates are highly correlated Gaussian, the Gaussian regression of log-transformed survival time with only two covariates outperforms all tested Cox regression models when total number of events <500.
翻译:度量 : 时间到事件结果的特性选择是临床试验和生物标志发现研究中的根本问题之一。 但是, 当样本大小小或某些关键共变量没有测量时, 不清楚应该使用何种统计方法 。 DESIGN : 在模拟研究中, 真正的模型是10个共变量的多变量 Cox 比例危害模型 。 假设所有模型安装的10个真实特征中只有5个被观察到/ 测量, 加上5个随机噪音特征 。 每个样本规模设想都使用10,000个模拟数据集进行探索。 在每个数据集中, 都应用了八个回归模型来估计特征效果, 包括常规的高频回归( 弹性网罚) 和常规化的 Cox 回归( glmnet Cox ) 。 ResultS : 如果共事件总数 < 500 时, 日志转换后生存时间只有两个共测试过的 Cox 回归模型。