The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-70% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines.
翻译:时间序列中缺失值的估算在医疗保健和金融方面有许多应用。虽然自动递减模型是时间序列估算的自然候选物,但基于分数的传播模型最近优于现有的对应模型,包括图像生成和音频合成等许多任务中的自动递减模型,而且对时间序列估算很有希望。在本文中,我们提出了基于条件的分数推算模型(CSDI),这是使用以观察数据为条件的基于分数的传播模型的一种新型时间序列估算方法。与现有的以分数为基础的方法不同,有条件的推广模型是明确的估算方法,可以利用观察到的值之间的关联。在保健和环境数据方面,CSCI比现有流行性绩效指标的概率推算方法提高了40-70%。此外,CSEI的确定性估算方法比按观察到的数据推算方法减少了5-20%的误差。此外,CISI还可以适用于时间序列的内置和预测预测,并且与现有的基线具有竞争力。