Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censoring setting, when the censoring times are completely exogenous, the lower predictive bounds have guaranteed coverage in finite samples without any assumptions other than that of operating on independent and identically distributed data points. Under a more general conditionally independent censoring assumption, the bounds satisfy a doubly robust property which states the following: marginal coverage is approximately guaranteed if either the censoring mechanism or the conditional survival function is estimated well. Further, we demonstrate that the lower predictive bounds remain valid and informative for other types of censoring. The validity and efficiency of our procedure are demonstrated on synthetic data and real COVID-19 data from the UK Biobank.
翻译:现有的生存分析技术在很大程度上依赖强大的模型假设,因此很容易产生错误的分类错误。在本文中,我们根据符合预测的理论,制定了一种推断方法,可以绕过任何生存预测算法,在生存时间上产生校准、共变依赖的低预测界限。在I类右检查设置中,当审查时间完全外在时,低预测界限保证了有限样品的覆盖,除了独立和相同分布的数据点之外,没有任何其他假设。在更一般的有条件独立审查假设下,界限满足了双重强健的属性,其中指出:如果审查机制或有条件的生存功能得到很好的估计,边缘覆盖大体上是有保障的。此外,我们证明低预测界限对于其他类型的检查依然有效且信息丰富。我们程序的有效性和效率在合成数据和来自英国生物库的真实的COVID-19数据上得到了证明。