Currently, discovering subsequence anomalies in time series remains one of the most topical research problems. A subsequence anomaly refers to successive points in time that are collectively abnormal, although each point is not necessarily an outlier. Among a large number of approaches to discovering subsequence anomalies, the discord concept is considered one of the best. A time series discord is intuitively defined as a subsequence of a given length that is maximally far away from its non-overlapping nearest neighbor. Recently introduced the MERLIN algorithm discovers time series discords of every possible length in a specified range, thereby eliminating the need to set even that sole parameter to discover discords in a time series. However, MERLIN is serial and its parallelization could increase the performance of discords discovery. In this article, we introduce a novel parallelization scheme for GPUs, called PALMAD, Parallel Arbitrary Length MERLIN-based Anomaly Discovery. As opposed to its serial predecessor, PALMAD employs recurrent formulas we have derived to avoid redundant calculations, and advanced data structures for the efficient implementation of parallel processing. Experimental evaluation over real-world and synthetic time series shows that our algorithm outperforms parallel analogs. We also apply PALMAD to discover anomalies in a real-world time series employing our proposed discord heatmap technique to illustrate the results.
翻译:目前,发现时间序列中的子序列异常仍然是最热门的研究问题之一。子序列异常指的是在时间上连续的点被认为是异常的,尽管每个点本身不一定是离群值。在众多发现子序列异常的方法中,不和谐概念被认为是最好的之一。时间序列不和谐可以直观地定义为给定长度的子序列,它与其非重叠最近邻元素的距离是最大的。最近提出的MERLIN算法可以在指定范围内发现每个可能长度的时间序列不和谐,从而消除了在发现时间序列不和谐时设置任何参数的需求。然而,MERLIN是串行的,其并行化可以增加不和谐发现的性能。在本文中,我们介绍了一种新的GPU并行化方案,称为PALMAD,即基于MERLIN的任意长度的并行异常发现。与其串行前身相反,PALMAD采用我们推导出的递归公式避免了冗余计算,并采用先进的数据结构来实现高效的并行处理。对真实世界和合成时间序列的实验评估表明,我们的算法优于并行模拟。我们还应用PALMAD来发现实际时间序列中的异常情况,采用我们提议的不和谐热图技术来说明结果。