The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI.
翻译:时间序列中缺失值的估算在医疗保健和融资方面有许多应用。虽然自动递减模型是时间序列估算的自然候选物,但基于分数的传播模型最近优于现有模型,包括图像生成和音频合成等许多任务中的自动递减模型,而且对时间序列估算很有希望。在本文中,我们提出了基于条件的分数推算模型(CSDI),这是使用以观察数据为条件的基于分数的传播模型的新型时间序列估算方法。与现有的基于分数的方法不同,有条件的推广模型是明确的估算方法,可以利用观察到的值之间的关联。在保健和环境数据方面,CISDI比现有的流行性绩效指标的概率推算方法提高了40-65%。此外,CSMI的确定性估算方法比以观察数据为条件的推算方法减少了5-20%。此外,CSISI还可以在时间序列内断分数和预测/预测组之间应用有条件的推广模型。