IMPORTANCE: Time-to-event outcomes are commonly used in clinical trials and biomarker discovery studies and have been primarily analyzed using Cox proportional hazards models. But it's unclear which statistical models should be recommended for feature selection tasks when time-to-event outcomes are of the primary interest. OBJECTIVE: To explore if Gaussian regression of log-transformed survival time could outperform Cox proportional hazards models in feature selection. DESIGN: In this simulation study, the true models are multivariate Cox proportional hazards models with 10 covariates. For all feature selection comparisons, it's assumed that only 5 out the 10 true features are observed/measured for all model fitting, along with 5 random noise features. Each sample size and censoring rate scenario is explored using 10,000 simulation datasets. Different statistical models are applied to the same dataset to estimate feature effects. Model performance is compared using sensitivity, specificity, and accuracy of effect size ranking. RESULTS: When features are independent and the true models are multivariate Cox proportional hazards models, Gaussian regression of log-transformed survival time (response variable) with only two covariates outperformed both the univariate Cox proportional hazards model and logistic regression in feature selection, in terms of not only higher sensitivity, comparable specificity, but also higher accuracy of effect size ranking, regardless of the sample size and censoring rate values. CONCLUSIONS AND RELEVANCE: This study demonstrates the importance of including Gaussian regression of log-transformed survival time in feature selection practice for time-to-event outcomes.
翻译:重要性: 时间到事件结果通常用于临床试验和生物标志发现研究, 并且主要使用 Cox 比例危害模型进行分析。 但是, 当时间到事件的结果具有首要意义时, 不清楚应该建议哪些统计模型来执行特征选择任务。 注意: 要探索在特征选择中, 将日志转换的生存时间回归到日志比Cox 比例危害模型要快得多。 DESIGN: 在本次模拟研究中, 真正的模型是多变量 Cox 比例危害模型, 有 10 个千差数。 对于所有特征选择的对比, 假设所有模型安装时, 仅观察/ 衡量 10 个真实特征中的5个。 每个样本大小和检查率假设都使用10,000 模拟数据集来探索。 不同的统计模型应用来估计特征效果。 模型的性能用敏感度、 特性和效果等级排序来比较。 RETLS: 当特征是独立的, 真实模型是多变量比值的 Cory 比例模型, 对比所有特征选择模型的10 真实性特征比值的精确度比值的精确度大小, 以及精确度比值比值的精确度, 的精确度比值比值, 的精确度比值比值, 的精确度比值比值比值比值比值比值比值比值, 。