The volume of time series data has exploded due to the popularity of new applications, such as data center management and IoT. Subsequence matching is a fundamental task in mining time series data. All index-based approaches only consider raw subsequence matching (RSM) and do not support subsequence normalization. UCR Suite can deal with normalized subsequence match problem (NSM), but it needs to scan full time series. In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. The cNSM problem provides a knob to flexibly control the degree of offset shifting and amplitude scaling, which enables users to build the index to process the query. We propose a new index structure, KV-index, and the matching algorithm, KV-match. With a single index, our approach can support both RSM and cNSM problems under either ED or DTW distance. KV-index is a key-value structure, which can be easily implemented on local files or HBase tables. To support the query of arbitrary lengths, we extend KV-match to KV-match$_{DP}$, which utilizes multiple varied-length indexes to process the query. We conduct extensive experiments on synthetic and real-world datasets. The results verify the effectiveness and efficiency of our approach.
翻译:时间序列数据的数量由于数据中心管理和 IoT 等新应用程序的普及性而爆炸了。 后序匹配是采矿时间序列数据中的一项基本任务。 所有基于索引的方法都只考虑原始子序列匹配(RSM), 不支持后序列正常化。 UCR 套件可以处理正常的子序列匹配问题(NSM), 但是它需要扫描全时序列。 在本文件中, 我们提出了一个新问题, 被点名为受限的常规子序列匹配问题( cNSM), 给NSM问题增加了一些限制。 cNSM 问题提供了一个 knob, 以灵活控制抵消移动和振幅缩缩缩缩缩缩比例的大小, 使用户能够构建处理查询的索引。 我们提出了一个新的索引结构, KV- 索引和匹配算法匹配。 我们的方法可以支持在 ED 或 DW 的距离下 RSM 和 cNSM 问题。 KV- 索引是一个关键值结构, 可以在本地文档或 HBase 表格中轻松, 用于多处的校验数据长度。