We present a simple yet novel time series imputation technique with the goal of constructing an irregular time series that is uniform across every sample in a data set. Specifically, we fix a grid defined by the midpoints of non-overlapping bins (dubbed "slices") of observation times and ensure that each sample has values for all of the features at that given time. This allows one to both impute fully missing observations to allow uniform time series classification across the entire data and, in special cases, to impute individually missing features. To do so, we slightly generalize the well-known class imbalance algorithm SMOTE \cite{smote} to allow component wise nearest neighbor interpolation that preserves correlations when there are no missing features. We visualize the method in the simplified setting of 2-dimensional uncoupled harmonic oscillators. Next, we use tSMOTE to train an Encoder/Decoder long-short term memory (LSTM) model with Logistic Regression for predicting and classifying distinct trajectories of different 2D oscillators. After illustrating the the utility of tSMOTE in this context, we use the same architecture to train a clinical model for COVID-19 disease severity on an imputed data set. Our experiments show an improvement over standard mean and median imputation techniques by allowing a wider class of patient trajectories to be recognized by the model, as well as improvement over aggregated classification models.
翻译:我们展示了一个简单而新颖的时间序列估算技术,目的是构建一个不规则的时间序列,该序列在数据集中每个样本中都是统一的。具体地说,我们固定了一个由观测时间中点(不重叠的“切片”)中点所定义的网格,确保每个样本在给定时间对所有特征都有值。这样既可以对完全缺失的观测进行估算,以便在整个数据中进行统一的时间序列分类,在特殊情况下,可以对单个缺失的特性进行估算。为了做到这一点,我们略微概括了众所周知的阶级不平衡算法SMOTE\cite{smote}的分类,以便允许在没有缺失特征时使用最明智的邻居内插图,从而保持相关性。我们在简化设置二维、不相交错的调心振动器时,可以将完全缺失的观测结果用于对 Encoder/Decod 长期内存(LSTM) 模型。为此,我们稍微短的内存(LSTM) 略地将已知的类失衡算法分类,以便预测和对不同的2D-19级变色变换的轨中位技术进行分类的分类,在模型上显示一个模型上的模型上的实用性模型,然后通过我们用来显示一个模型来显示一个模型的模型。