Labeling time series data is an expensive task because of domain expertise and dynamic nature of the data. Hence, we often have to deal with limited labeled data settings. Data augmentation techniques have been successfully deployed in domains like computer vision to exploit the use of existing labeled data. We adapt one of the most commonly used technique called MixUp, in the time series domain. Our proposed, MixUp++ and LatentMixUp++, use simple modifications to perform interpolation in raw time series and classification model's latent space, respectively. We also extend these methods with semi-supervised learning to exploit unlabeled data. We observe significant improvements of 1\% - 15\% on time series classification on two public datasets, for both low labeled data as well as high labeled data regimes, with LatentMixUp++.
翻译:标记时间序列数据是一项昂贵的任务,原因是需要领域专业知识以及数据的动态性。因此,我们经常需要处理有限标记数据的情况。数据增强方法已经成功地应用于计算机视觉等领域,以利用现有的标记数据。我们在时间序列领域中使用最常用的技术之一MixUp。我们提出了MixUp++和LatentMixUp++,对原始时间序列和分类模型的潜在空间分别进行插值。我们还将这些方法扩展到半监督学习,以利用无标签数据。在两个公共数据集上,我们观察到在低标记数据和高标记数据的情况下,LatentMixUp++在时间序列分类中获得了1%至15%的显着改进。