Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when sequences are locally out of phase. In this paper, we propose the use of product quantization for efficient similarity-based comparison of time series under time warping. The idea is to first compress the data by partitioning the time series into equal length sub-sequences which are represented by a short code. The distance between two time series can then be efficiently approximated by pre-computed elastic distances between their codes. The partitioning into sub-sequences forces unwanted alignments, which we address with a pre-alignment step using the maximal overlap discrete wavelet transform (MODWT). To demonstrate the efficiency and accuracy of our method, we perform an extensive experimental evaluation on benchmark datasets in nearest neighbors classification and clustering applications. Overall, the proposed solution emerges as a highly efficient (both in terms of memory usage and computation time) replacement for elastic measures in time series applications.
翻译:由于存储成本高和计算要求高,实际上很难对大量或长的时间序列进行分析,因此,提议采用技术,对时间序列产生紧凑的相似性,保留时间序列的代号,从而能够对大型模拟数据收集进行实时的相似性搜索;然而,现有技术不适宜于评估相近性,因为序列从局部阶段结束。在本文件中,我们提议使用产品定量,对时间扭曲中的时间序列进行基于类似性的有效比较。设想是首先通过将时间序列分成以短代码为代表的相同长度子序列来压缩数据。然后,两个时间序列之间的距离可以通过预先计算弹性距离来有效地近似于它们之间的距离。分切成次序列会引发不想要的调整。我们用最大重叠离子波变换(MODWT)来应对前的调整步骤。为了显示我们的方法的效率和准确性,我们首先对近邻级分类和集群应用中的基准数据集进行了广泛的实验性评价。总体而言,在时间转换过程中,拟议采用的内存措施是高度高效的。