Time series shapelets are discriminative subsequences that have been recently found effective for time series clustering (TSC). The shapelets are convenient for interpreting the clusters. Thus, the main challenge for TSC is to discover high-quality variable-length shapelets to discriminate different clusters. In this paper, we propose a novel autoencoder-shapelet approach (AUTOSHAPE), which is the first study to take the advantage of both autoencoder and shapelet for determining shapelets in an unsupervised manner. An autoencoder is specially designed to learn high-quality shapelets. More specifically, for guiding the latent representation learning, we employ the latest self-supervised loss to learn the unified embeddings for variable-length shapelet candidates (time series subsequences) of different variables, and propose the diversity loss to select the discriminating embeddings in the unified space. We introduce the reconstruction loss to recover shapelets in the original time series space for clustering. Finally, we adopt Davies Bouldin index (DBI) to inform AUTOSHAPE of the clustering performance during learning. We present extensive experiments on AUTOSHAPE. To evaluate the clustering performance on univariate time series (UTS), we compare AUTOSHAPE with 15 representative methods using UCR archive datasets. To study the performance of multivariate time series (MTS), we evaluate AUTOSHAPE on 30 UEA archive datasets with 5 competitive methods. The results validate that AUTOSHAPE is the best among all the methods compared. We interpret clusters with shapelets, and can obtain interesting intuitions about clusters in two UTS case studies and one MTS case study, respectively.
翻译:时间序列元件是最近发现对时间序列群集( TSC) 有效的具有歧视性的子序列。 这些元件对于解释群集十分方便。 因此, TSC 的主要挑战是发现高质量的可变长元件以区别不同的群集。 在本文件中,我们建议采用新型自动编码元件显示器( AUTOSHAPE) 方法,这是第一次利用自动编码器和元件显示器来以不受监督的方式确定元件。 一个自动编码器是专门设计用来学习高质量元件的。 更具体地说, 用于指导潜在演示学习,我们使用最新的自我监督损失来学习不同变量群件候选人的统一嵌入式( 时间序列子序列序列序列序列序列序列序列) 。 我们提出多样性损失来选择在统一空间中的歧视性嵌入。 我们引入了重建元件 Bourdin 索引( DBII) 来向 AUTOOS 提供高级直观的直观直径分析结果, 在学习时, 我们用AUT IM IMA 进行大规模的业绩实验, 在 AUB IM 数据序列中, 进行我们 的自动分析 AUBE 的自动分析 AHR 的自动和多盘分析。