The introduction of machine learning (ML) techniques to the field of survival analysis has increased the flexibility of modeling approaches, and ML based models have become state-of-the-art. These models optimize their own cost functions, and their performance is often evaluated using the concordance index (C-index). From a statistical learning perspective, it is therefore an important problem to analyze the relationship between the optimizers of the C-index and those of the ML cost functions. We address this issue by providing C-index Fisher-consistency results and excess risk bounds for several of the commonly used cost functions in survival analysis. We identify conditions under which they are consistent, under the form of three nested families of survival models. We also study the general case where no model assumption is made and present a new, off-the-shelf method that is shown to be consistent with the C-index, although computationally expensive at inference. Finally, we perform limited numerical experiments with simulated data to illustrate our theoretical findings.
翻译:在生存分析领域引进机器学习技术增加了模型方法的灵活性,基于模型模型的模型已成为最新技术,这些模型优化了自己的成本功能,其性能经常使用一致性指数(C-指数)进行评估。因此,从统计学习角度分析C-指数优化者和ML成本函数优化者之间的关系是一个重要问题。我们通过为生存分析中一些常用的成本功能提供C-index渔业一致性结果和超额风险界限来解决这一问题。我们以三个生存模型的嵌套式的形式,确定这些模型的一致性条件。我们还研究没有模型假设的一般案例,并提出一种新的现成方法,该方法虽然计算成本很高,但证明与C-索引一致。最后,我们用模拟数据进行有限的数字实验,以说明我们的理论发现。