The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
翻译:缺失值的估算是许多真实世界数据分析管道的重大障碍。 在这里,我们侧重于时间序列数据,并提出了SSSD,这是一个依赖两种新兴技术的估算模型,(有条件的)传播模型作为最先进的基因模型,结构化国家空间模型作为内部模型结构,特别适合在时间序列数据中捕捉长期依赖性。我们证明SSSD与一系列广泛的数据集和不同缺失假设的预测性能相匹配,甚至超过了最先进的概率估算和预测性能,包括具有挑战性的断电假设,以前的做法未能提供有意义的结果。