The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete workout routine might be very similar. The second day may change the order in of performing push-ups and squats, adding repetitions of pull-ups, or completely omitting dumbbell curls. Any of these minor changes would defeat existing time series distance measures. Some bag-of-features methods have been proposed to address this problem, but we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with dictionaries. We will demonstrate the utility of our ideas on diverse tasks and datasets.
翻译:最有用的数据开采原始是距离测量。 如果有效的距离测量, 就可以进行分类、 集群、 异常检测、 分割等。 对于单时序时间序列 Euclidean 距离和动态时间扭曲, 已知是极为有效的。 但是, 对于包含周期行为的时间序列, 此类比较的语义意义不太清楚。 例如, 在两日之间, 运动员锻炼常规的遥测可能非常相似。 第二天可能会改变执行俯卧撑和蹲伏的顺序, 增加拉动的重复, 或完全省略哑铃卷曲等。 任何这些微小的改变都会挫败现有的时间序列距离测量。 已经提出了解决这个问题的一揽子方法, 但我们认为, 在许多情况下, 类似性与这些较长的时间序列中的子序列的形状紧密相连。 在这种情况下, 概括性特征将缺乏歧视能力。 在这项工作中, 我们引入了PRCIS, 也就是一个长时间序列的远程测量, 将利用我们最近完成的实用性任务来总结我们的数据序列。