Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.
翻译:从原始未标记的时间序列数据中学习语义丰富的表示对于分类和预测等后续任务非常重要。对比学习最近展示了在缺乏专家标注的情况下具有良好的表示学习能力。然而,现有的对比方法通常独立处理每个实例,这会导致存在共享相同语义的假负样本。为了解决这个问题,我们提出了MHCCL,即一种屏蔽的多层聚类层次对比学习模型,这个模型能够利用多个潜在分区的层次结构中获取到的语义信息来处理多变量时间序列。受到细粒度聚类保持更高纯度,而粗糙的聚类则反映了更高层次的语义的观察启发,我们提出了一种新颖的向下屏蔽策略,以过滤出错误的负样本,并补充正样本,通过融合聚类层次的多粒度信息来提升准确性。此外,我们在MHCCL中设计了一种新颖的向上屏蔽策略,以清除每个分区中聚类的异常点,以提高原型的质量,从而有助于加速层次聚类过程并提高聚类质量。我们在七个广泛使用的多变量时间序列数据集上进行了实验评估,结果表明,与最先进的无监督时间序列表示学习方法相比,MHCCL具有更优越的性能。