Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.
翻译:从原始无标签时间序列数据中学习语义丰富的表示对于分类和预测等下游任务至关重要。对比学习近年来显示了其在缺乏专家注释时具有潜在的表示学习能力,然而,现有的对比方法通常将每个实例视为独立的,这会导致共享相同语义的假阴性对。为了解决这个问题,我们提出了MHCCL - 一种用于多元时间序列的层级掩蔽聚类对比学习模型,该模型利用由多个潜在分区组成的分层结构获得的语义信息。受到精细聚类保持更高纯度,而粗粒度反映更高级语义的观察启发,我们提出了一种新的向下掩蔽策略来过滤掉假负性,并通过结合聚类层次结构中的多粒度信息来补充正例。此外,在MHCCL中设计了一种新的向上掩蔽策略来消除每个分区中聚类的异常值,从而改善原型,从而有助于加速层次聚类过程并提高聚类质量。我们在七个广泛使用的多元时间序列数据集上进行了实验评估。结果表明,MHCCL相对于最先进技术在无监督时间序列表示学习方面具有优越性。