Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.
翻译:在分类和预测等下游任务中,从原始未贴标签的时间序列数据中学习的语义内容丰富,对于分类和预测等下游任务至关重要。对比性学习最近表明,在没有专家说明的情况下,其代表性学习能力大有希望。然而,现有的对比性方法通常对每个实例都独立处理,导致假的负对等,共同使用相同的语义。为解决这一问题,我们提议采用蒙面的等级分层交叉反向学习模式MHCL,即一个蒙面的分层分层混合学习模式,利用由多变时间序列多个潜在分区隔层组成的分类层外层的语义信息。由于观察到精细分层集群保持较高的纯度,而粗略的组合一则反映更高层次的语义学,我们建议一种新型的下层遮掩面战略,以过滤假的负面,并通过纳入组群集层结构中的多层信息来补充正面。此外,MHCLCL还设计了一个新型的上层遮罩战略,以清除每个分区结构的外层外层,从而加快分层组合过程,改进组合质量。我们建议对7个广泛使用的多层代表制数据序列进行实验性的时间评估。