Time series analysis is quickly proceeding towards long and complex tasks. In recent years, fast approximate algorithms for discord search have been proposed in order to compensate for the increasing size of the time series. It is more interesting, however, to find quick exact solutions. In this research, we improved HOT SAX by exploiting two main ideas: the warm-up process, and the similarity between sequences close in time. The resulting algorithm, called HOT SAX Time (HST), has been validated with real and synthetic time series, and successfully compared with HOT SAX, RRA, SCAMP, and DADD. The complexity of a discord search has been evaluated with a new indicator, the cost per sequence (cps), which allows one to compare searches on time series of different lengths. Numerical evidence suggests that two conditions are involved in determining the complexity of a discord search in a non-trivial way: the length of the discords, and the noise/signal ratio. In the case of complex searches, HST can be more than 100 times faster than HOT SAX, thus being at the forefront of the exact discord search.
翻译:时间序列分析正在迅速走向长期和复杂的任务。近年来,为了弥补时间序列的日益扩大,提出了快速的不和搜索近似算法,以弥补时间序列的日益扩大,但更有意思的是找到快速的精确解决方案。在这个研究中,我们利用两个主要想法改进了HOT SAX : 暖化过程和时间接近序列之间的相似性。 由此产生的算法叫做HOT SAX Time(HST),已经以真实和合成的时间序列加以验证,并与HOT SAX、RRA、SCAMP 和 DADD作了成功比较。 对不和搜索的复杂性进行了新的指标评估,即每一序列的成本(cps),这样可以比较不同长度的时间序列的搜索。 数字证据表明,在确定非边际搜索的复杂性方面有两个条件:不和噪音/信号比率的长度。在复杂的搜索中,HST可以比HOT SAX更快100倍以上,因此处于精确不和对等的前沿。