Time series are ubiquitous and therefore inherently hard to analyze and ultimately to label or cluster. With the rise of the Internet of Things (IoT) and its smart devices, data is collected in large amounts any given second. The collected data is rich in information, as one can detect accidents (e.g. cars) in real time, or assess injury/sickness over a given time span (e.g. health devices). Due to its chaotic nature and massive amounts of datapoints, timeseries are hard to label manually. Furthermore new classes within the data could emerge over time (contrary to e.g. handwritten digits), which would require relabeling the data. In this paper we present SuSL4TS, a deep generative Gaussian mixture model for semi-unsupervised learning, to classify time series data. With our approach we can alleviate manual labeling steps, since we can detect sparsely labeled classes (semi-supervised) and identify emerging classes hidden in the data (unsupervised). We demonstrate the efficacy of our approach with established time series classification datasets from different domains.
翻译:时间序列无处不在,因此必然难以分析和最终标签或组装。 随着物联网及其智能装置的兴起,数据会收集大量信息。所收集的数据丰富,因为人们可以实时检测事故(如汽车),或评估特定时间段的伤害/不适性(如医疗设备)。由于数据点的混乱性质和大量数据点,时间序列很难手工标出。随着数据中的新类别(如手写数字)的出现,需要重新标出数据。在本文中,我们展示了SSSLS4TS,一个用于半无监督学习的深基因化高斯混合模型,用于对时间序列数据进行分类。我们的方法可以缓解人工标签步骤,因为我们可以检测到少贴标签的类别(小监视),并识别数据中隐藏的新兴类别(未被覆盖 ) 。我们展示了我们的方法的有效性,从不同领域建立的时间序列分类数据组中展示了我们的方法。