Cross-sectional prediction is common in many domains such as healthcare, including forecasting tasks using electronic health records, where different patients form a cross-section. We focus on the task of constructing valid prediction intervals (PIs) in time-series regression with a cross-section. A prediction interval is considered valid if it covers the true response with (a pre-specified) high probability. We first distinguish between two notions of validity in such a setting: cross-sectional and longitudinal. Cross-sectional validity is concerned with validity across the cross-section of the time series data, while longitudinal validity accounts for the temporal dimension. Coverage guarantees along both these dimensions are ideally desirable; however, we show that distribution-free longitudinal validity is theoretically impossible. Despite this limitation, we propose Conformal Prediction with Temporal Dependence (CPTD), a procedure which is able to maintain strict cross-sectional validity while improving longitudinal coverage. CPTD is post-hoc and light-weight, and can easily be used in conjunction with any prediction model as long as a calibration set is available. We focus on neural networks due to their ability to model complicated data such as diagnosis codes for time-series regression, and perform extensive experimental validation to verify the efficacy of our approach. We find that CPTD outperforms baselines on a variety of datasets by improving longitudinal coverage and often providing more efficient (narrower) PIs.
翻译:跨部门预测在许多领域很常见,如医疗保健,包括使用电子健康记录进行预测任务,不同病人形成交叉剖面。我们侧重于在具有交叉剖面的时序回归中构建有效的预测间隔(PIs)的任务。如果预测间隔覆盖真实反应(预先确定)的概率高,则预测间隔被认为是有效的。我们首先在这种背景下区分两种有效性概念:跨部门和纵向。跨部门有效性涉及时间序列数据跨部分的有效性,而时间层面的纵向有效性账户则涉及时间层面的纵向有效性账户。这两个层面的覆盖保障是理想的;然而,我们表明理论上不可能实现无分布的纵向有效性。尽管存在这一局限性,我们提议采用 " 随机依赖性预测 " (CPTD)这一程序,这一程序能够保持严格的跨部分有效性,同时改善长纵向覆盖。CPTD是后和轻度的,并且很容易与任何预测模型模型模型模型模式一起使用。我们侧重于神经网络,因为其能力是无分布式纵向有效性,因此我们建议采用更为复杂的模型性预测性预测(CP),我们经常通过实验性标准来进行更精确的校准性数据。