Time series data play an important role in many applications and their analysis reveals crucial information for understanding the underlying processes. Among the many time series learning tasks of great importance, we here focus on semi-supervised learning based on a graph representation of the data. Two main aspects are involved in this task. A suitable distance measure to evaluate the similarities between time series, and a learning method to make predictions based on these distances. However, the relationship between the two aspects has never been studied systematically in the context of graph-based learning. We describe four different distance measures, including (Soft) DTW and MPDist, a distance measure based on the Matrix Profile, as well as four successful semi-supervised learning methods, including the graph Allen--Cahn method and a Graph Convolutional Neural Network. We then compare the performance of the algorithms on binary classification data sets. In our findings we compare the chosen graph-based methods using all distance measures and observe that the results vary strongly with respect to the accuracy. As predicted by the ``no free lunch'' theorem, no clear best combination to employ in all cases is found. Our study provides a reproducible framework for future work in the direction of semi-supervised learning for time series with a focus on graph representations.
翻译:时间序列数据在许多应用中起着重要作用,它们的分析揭示了理解基本过程的关键信息。在许多时间序列学习任务中,我们在此侧重于基于数据图形的半监督学习。其中涉及两个主要方面。一个适当的距离测量,以评价时间序列之间的相似性,一个根据这些距离进行预测的学习方法。然而,这两个方面之间的关系从未在基于图表的学习中系统地研究过。我们描述了四种不同的距离测量,包括(软)DTW和MPDist,一种基于矩阵剖析的远程测量,以及四种成功的半监督学习方法,包括Allen-Cahn图方法和图动画神经网络。然后我们比较二进制分类数据集的算法性能。在我们的研究结果中,我们用所有距离测量尺度比较了选定的基于图表的方法,并观察到结果在准确性方面差异很大。正如“免费午餐”理论预测的那样,在所有情况中都没有采用明确的最佳组合。我们的研究发现,在半进化模型中,我们的研究提供了一个未来学习方向的精确框架。