Variable selection is an important problem in statistics and machine learning. Copula Entropy (CE) is a mathematical concept for measuring statistical independence and has been applied to variable selection recently. In this paper we propose to apply the CE-based method for variable selection to survival analysis. The idea is to measure the correlation between variables and time-to-event with CE and then select variables according to their CE value. Experiments on simulated data and two real cancer data were conducted to compare the proposed method with two related methods: random survival forest and Lasso-Cox. Experimental results showed that the proposed method can select the 'right' variables out that are more interpretable and lead to better prediction performance.
翻译:变量选择是统计和机器学习中的一个重要问题。 Copula Entropy (CE) 是衡量统计独立性的一个数学概念,最近已应用于变量选择。 在本文中,我们提议在生存分析中应用基于 CE 的变量选择方法。 其想法是测量变量和时间-活动与 CE 的关联性,然后根据它们的 CE 值选择变量。 在模拟数据和两个真正的癌症数据上进行了实验,以便将拟议方法与两种相关方法进行比较:随机生存森林和Lasso-Cox。 实验结果显示,拟议方法可以选择“ 右” 变量,这些变量可以解释,并导致更好的预测性能。