We introduce ASTRIDE (Adaptive Symbolization for Time seRIes DatabasEs), a novel symbolic representation of time series, along with its accelerated variant FASTRIDE (Fast ASTRIDE). Unlike most symbolization procedures, ASTRIDE is adaptive during both the segmentation step by performing change-point detection and the quantization step by using quantiles. Instead of proceeding signal by signal, ASTRIDE builds a dictionary of symbols that is common to all signals in a data set. We also introduce D-GED (Dynamic General Edit Distance), a novel similarity measure on symbolic representations based on the general edit distance. We demonstrate the performance of the ASTRIDE and FASTRIDE representations compared to SAX (Symbolic Aggregate approXimation), 1d-SAX, SFA (Symbolic Fourier Approximation), and ABBA (Adaptive Brownian Bridge-based Aggregation) on reconstruction and, when applicable, on classification tasks. These algorithms are evaluated on 86 univariate equal-size data sets from the UCR Time Series Classification Archive. An open source GitHub repository called astride is made available to reproduce all the experiments in Python.
翻译:我们引入了ASTRIDE(时间的加速符号符号符号化 Databases ), 这是时间序列的象征性象征, 以及加速的变异Fastride( 快速 ASTRIDE ) 。 与大多数符号化程序不同, ASTRIDE在分解步骤期间都通过使用分数检测变化点和量化步骤进行适应性。 我们不是通过信号进行信号处理,而是建立一套数据集中所有信号通用的符号词典。 我们还引入了D-GED( 动态总编辑距离), 这是一种基于总编辑距离的象征性表示的新的相似性措施。 我们展示了ASTRIDE 和 FASTRIDE 代表与SAX( 类流综合近似)、 1d-SAX、 SFAA( 循环四重力吸附) 和 ABBA( Adapticive Brown Bridge- Agrigistration) 有关重建的和( 适用时, 分类任务)。 这些算法对UCRive PyPrideal Registreal Registrationsal 的86 等非规模数据组进行了评估。