Symbolic representations are a useful tool for the dimension reduction of temporal data, allowing for the efficient storage of and information retrieval from time series. They can also enhance the training of machine learning algorithms on time series data through noise reduction and reduced sensitivity to hyperparameters. The adaptive Brownian bridge-based aggregation (ABBA) method is one such effective and robust symbolic representation, demonstrated to accurately capture important trends and shapes in time series. However, in its current form the method struggles to process very large time series. Here we present a new variant of the ABBA method, called fABBA. This variant utilizes a new aggregation approach tailored to the piecewise representation of time series. By replacing the k-means clustering used in ABBA with a sorting-based aggregation technique, and thereby avoiding repeated sum-of-squares error computations, the computational complexity is significantly reduced. In contrast to the original method, the new approach does not require the number of time series symbols to be specified in advance. Through extensive tests we demonstrate that the new method significantly outperforms ABBA with a considerable reduction in runtime while also outperforming the popular SAX and 1d-SAX representations in terms of reconstruction accuracy. We further demonstrate that fABBA can compress other data types such as images.
翻译:符号表示是减少时间数据尺寸的有用工具,可以有效地储存时间序列和从时间序列检索信息,还可以通过减少噪音和降低对超参数的敏感度,加强对时间序列数据的机器学习算法的培训。适应性布朗桥基总合(ABBA)方法如此有效、有力,能够准确捕捉时间序列中的重要趋势和形状。然而,目前的方法形式是难以处理非常大的时间序列。我们在这里介绍了ABBA方法的新变种,称为FABBA。这种变种采用一种适合时间序列的片段表示方式的新的汇总方法。通过用基于分类的集成技术取代ABBA中使用的K手段组合,从而避免反复出现总和方差的计算,计算复杂性显著降低。与最初的方法相比,新办法并不要求预先规定多少时间序列符号。通过广泛的测试,我们证明新的方法大大优于ABBA方法,在运行期间大大缩短了运行时间,同时也超越了以排序为基础的组合组合组合组合技术,从而避免了反复计算总和方差误差的计算方法。与原始方法相比,新的方法不需要预先规定多少时间序列符号符号符号。我们可以进一步展示了SAAX的精确度。