This paper introduces a method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data. Traditional methods for survival analysis often assume a parametric model for the distribution of survival time as a function of the measured covariates, or assume that this conditional distribution is captured well with a non-parametric method such as random forests; however, these methods may lead to undercoverage if their assumptions are not satisfied. In this paper, we build on recent work by Cand\`es et al. (2021), which offers a more assumption-lean approach to the problem. Their approach first subsets the data to discard any data points with early censoring times and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves approximately exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users' active times on a mobile app.
翻译:本文介绍一种方法,用受审查的数据为生存时间构建有效和高效的低预测界限(LPBs) 。 传统的生存分析方法通常假定生存时间分配参数模型作为测量的共差函数的函数, 或假设这种有条件分配方法与随机森林等非参数方法非常相似; 但是, 如果它们的假设不令人满意, 这些方法可能导致秘密。 本文以Cand ⁇ es et al. (2021) 的最新工作为基础, 它为这一问题提供了一种更边际的假设值方法。 它们的方法首先将数据分组, 丢弃任何数据点, 提前审查时间, 然后使用一种重标技术( 加权一致推断( Tibshirani et al., 2019) 来纠正这个子设置程序带来的非参数分布变化。 对于我们的新方法, 而不是限制在调整数据时的固定门槛值, 我们允许一种基于内置值和数据适应性的额外配置步骤, 它能更好地捕捉任何数据点的数据点, 并且让我们的超值的超值运行时间, 当我们用来在精确的浏览机制中, 我们的精确地展示了一种精确的精确的计算方法时, 我们的精确的精确的计算, 能够让我们的精确的精确的精确的计算, 使正确的计算, 我们的精确的精确的精确的精确的精确的精确的精确的精确的精确的计算, 。