Given data on a scalar random variable $Y$, a prediction set for $Y$ with miscoverage level $\alpha$ is a set of values for $Y$ that contains a randomly drawn $Y$ with probability $1 - \alpha$, where $\alpha \in (0,1)$. Among all prediction sets that satisfy this coverage property, the oracle prediction set is the one with the smallest volume. This paper provides estimation methods of such prediction sets given observed conditioning covariates when $Y$ is censored or measured in intervals. We first characterise the oracle prediction set under interval censoring and develop a consistent estimator for the shortest prediction interval that satisfies this coverage property. We then extend these consistency results to accommodate cases where the prediction set consists of multiple disjoint intervals. Second, we use conformal inference to construct a prediction set that achieves a particular notion of finite-sample validity under censoring and maintains consistency as sample size increases. This notion exploits exchangeability to obtain finite sample guarantees on coverage using a specially constructed conformity score function. The procedure accomodates the prediction uncertainty that is irreducible (due to the stochastic nature of outcomes), the modelling uncertainty due to partial identification and also sampling uncertainty that gets reduced as samples get larger. We conduct a set of Monte Carlo simulations and an application to data from the Current Population Survey. The results highlight the robustness and efficiency of the proposed methods.
翻译:暂无翻译