Missing data in time series is a challenging issue affecting time series analysis. Missing data occurs due to problems like data drops or sensor malfunctioning. Imputation methods are used to fill in these values, with quality of imputation having a significant impact on downstream tasks like classification. In this work, we propose a semi-supervised imputation method, ST-Impute, that uses both unlabeled data along with downstream task's labeled data. ST-Impute is based on sparse self-attention and trains on tasks that mimic the imputation process. Our results indicate that the proposed method outperforms the existing supervised and unsupervised time series imputation methods measured on the imputation quality as well as on the downstream tasks ingesting imputed time series.
翻译:缺失数据是影响时间序列分析的一个棘手问题。数据丢失或传感器故障是造成数据缺失的原因。插补方法用于填补这些不完整的数据,插补的质量对于诸如分类之类的下游任务具有重要的影响。在本文中,我们提出了一种半监督的插补方法:"ST-Impute",该方法利用无标签数据以及下游任务的标记数据。"ST-Impute" 基于稀疏自注意力机制训练,并在模拟插补过程的任务上进行训练。我们的结果表明,与现有的监督和无监督时间序列插补方法相比,所提出的方法在插补质量以及吸收插补时间序列的下游任务上表现出更好的性能。