在时间到活动结果方面,是否应当使用单象牙反转法来选择特征? (Should univariate Cox regression be used for feature selection with respect to time-to-event outcomes?)

IMPORTANCE: Time-to-event outcomes are commonly used in clinical trials and biomarker discovery studies and have been primarily analyzed using Cox proportional hazards models. But it's unclear which statistical models should be recommended for feature selection tasks when time-to-event outcomes are of the primary interest. OBJECTIVE: To explore if Gaussian regression of log-transformed survival time could outperform Cox proportional hazards models in feature selection. DESIGN: In this simulation study, the true models are multivariate Cox proportional hazards models with 10 covariates. For all feature selection comparisons, it's assumed that only 5 out the 10 true features are observed/measured for all model fitting, along with 5 random noise features. Each sample size and censoring rate scenario is explored using 10,000 simulation datasets. Different statistical models are applied to the same dataset to estimate feature effects. Model performance is compared using sensitivity, specificity, and accuracy of effect size ranking. RESULTS: When features are independent and the true models are multivariate Cox proportional hazards models, Gaussian regression of log-transformed survival time (response variable) with only two covariates outperformed both the univariate Cox proportional hazards model and logistic regression in feature selection, in terms of not only higher sensitivity, comparable specificity, but also higher accuracy of effect size ranking, regardless of the sample size and censoring rate values. CONCLUSIONS AND RELEVANCE: This study demonstrates the importance of including Gaussian regression of log-transformed survival time in feature selection practice for time-to-event outcomes.

翻译：重要性: 时间到事件结果通常用于临床试验和生物标志发现研究, 并且主要使用 Cox 比例危害模型进行分析。但是, 当时间到事件的结果具有首要意义时, 不清楚应该建议哪些统计模型来执行特征选择任务。注意: 要探索在特征选择中, 将日志转换的生存时间回归到日志比Cox 比例危害模型要快得多。 DESIGN: 在本次模拟研究中, 真正的模型是多变量 Cox 比例危害模型, 有 10 个千差数。对于所有特征选择的对比, 假设所有模型安装时, 仅观察/ 衡量 10 个真实特征中的5个。每个样本大小和检查率假设都使用10,000 模拟数据集来探索。不同的统计模型应用来估计特征效果。模型的性能用敏感度、特性和效果等级排序来比较。 RETLS: 当特征是独立的, 真实模型是多变量比值的 Cory 比例模型, 对比所有特征选择模型的10 真实性特征比值的精确度比值的精确度大小, 以及精确度比值比值的精确度, 的精确度比值比值, 的精确度比值比值, 的精确度比值比值比值比值比值比值比值比值, 。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日