Cross-sectional prediction is common in many domains such as healthcare, including forecasting tasks using electronic health records, where different patients form a cross-section. We focus on the task of constructing valid prediction intervals (PIs) in time series regression with a cross-section. A prediction interval is considered valid if it covers the true response with (a pre-specified) high probability. We first distinguish between two notions of validity in such a setting: cross-sectional and longitudinal. Cross-sectional validity is concerned with validity across the cross-section of the time series data, while longitudinal validity accounts for the temporal dimension. Coverage guarantees along both these dimensions are ideally desirable; however, we show that distribution-free longitudinal validity is theoretically impossible. Despite this limitation, we propose Conformal Prediction with Temporal Dependence (CPTD), a procedure that is able to maintain strict cross-sectional validity while improving longitudinal coverage. CPTD is post-hoc and light-weight, and can easily be used in conjunction with any prediction model as long as a calibration set is available. We focus on neural networks due to their ability to model complicated data such as diagnosis codes for time series regression, and perform extensive experimental validation to verify the efficacy of our approach. We find that CPTD outperforms baselines on a variety of datasets by improving longitudinal coverage and often providing more efficient (narrower) PIs.
翻译:跨部门预测在许多领域很常见,如医疗保健,包括使用电子健康记录进行预测任务,不同病人形成交叉剖面。我们注重在时间序列回归中用交叉剖面构建有效预测间隔(PIs)的任务。如果预测间隔覆盖真实反应(预先确定)的概率高,则预测间隔被认为是有效的。我们首先在这种背景下区分两种有效性概念:跨部门和纵向。跨部门有效性涉及时间序列数据跨部分的有效性,而纵向有效性核算则涉及时间层面的时间层面。这两个层面的覆盖保障是理想的;然而,我们表明理论上不可能实现无分布纵向有效性。尽管存在这一局限性,我们还是提出了带有时间依赖性(CPTD)的真正反应(CPTD),这一程序能够保持严格的跨部门有效性,同时改善长度覆盖。CPTD是后部和轻量级的,并且可以很容易与任何预测模型一起使用,只要有一个校准设置即可。我们侧重于神经网络,因为它们有能力使用模型化的无分布上的无分布式纵向有效性。尽管存在这一限制,但我们还是提出了与Temalal D(CP)相连接的常规化预测值,我们经常通过实验性测试性模型来改进数据,以便改进基准级对比。