The study of natural and human-made processes often results in long sequences of temporally-ordered values, aka time series (TS). Such processes often consist of multiple states, e.g. operating modes of a machine, such that state changes in the observed processes result in changes in the distribution of shape of the measured values. Time series segmentation (TSS) tries to find such changes in TS post-hoc to deduce changes in the data-generating process. TSS is typically approached as an unsupervised learning problem aiming at the identification of segments distinguishable by some statistical property. Current algorithms for TSS require domain-dependent hyper-parameters to be set by the user, make assumptions about the TS value distribution or the types of detectable changes which limits their applicability. Common hyperparameters are the measure of segment homogeneity and the number of change points, which are particularly hard to tune for each data set. We present ClaSP, a novel, highly accurate, hyper-parameter-free and domain-agnostic method for TSS. ClaSP hierarchically splits a TS into two parts. A change point is determined by training a binary TS classifier for each possible split point and selecting the one split that is best at identifying subsequences to be from either of the partitions. ClaSP learns its main two model-parameters from the data using two novel bespoke algorithms. In our experimental evaluation using a benchmark of 107 data sets, we show that ClaSP outperforms the state of the art in terms of accuracy and is fast and scalable. Furthermore, we highlight properties of ClaSP using several real-world case studies.
翻译:自然和人为过程的研究往往导致时间顺序定值的长期序列, aka时间序列(TS) 。这种过程通常由多个状态组成,例如机器的运作模式,因此显示观察过程的变化导致测量值的形状分布的变化。时间序列分解(TS)试图发现TS后热量的这种变化,以推断数据生成过程的变化。TSS通常被视为一个不受监督的学习问题,目的是确定某些统计属性可辨别的区段。TSS目前的算法要求用户设定基于域的超参数,对TS值分布或可检测的变化类型作出假设,从而限制其适用性。常见超参数是测量值的分数分布和变化点的测量,对于每个数据集来说,特别难以调和。我们为 TSS 提供了一种新颖的、非常精确的、无超光度和网域的分类方法。目前,TSS 目前的算法需要由用户设定基于域的超参数的超参数,对基于域参数的超值的超值参数进行假设,对TS值分布或可探测的变化的类型进行假设。 普通的Cal-al-al-al-al-salalalal- calalal 分析,从我们使用两个分级的分级的分级的分数,在Calalalalalalal-cal-cal-cal-x,从两个分选取一个分数,从两个分数,从一个分,从两个分,从两个分,从一个分数到两个阶段,从一个分解到两个分数,从一个分解到一个分数,从一个分数,从一个分数,从一个分数,从一个分数,从一个分解到两个分数,从一个分解到一个分解到一个分解到一个分解到一个分解到一个分解到一个分解C。