Multivariate Time-Series (MTS) clustering discovers intrinsic grouping patterns of temporal data samples. Although time-series provide rich discriminative information, they also contain substantial redundancy, such as steady-state machine operation records and zero-output periods of solar power generation. Such redundancy diminishes the attention given to discriminative timestamps in representation learning, thus leading to performance bottlenecks in MTS clustering. Masking has been widely adopted to enhance the MTS representation, where temporal reconstruction tasks are designed to capture critical information from MTS. However, most existing masking strategies appear to be standalone preprocessing steps, isolated from the learning process, which hinders dynamic adaptation to the importance of clustering-critical timestamps. Accordingly, this paper proposes the Evolving-masked MTS Clustering (EMTC) method, whose model architecture comprises Importance-aware Variate-wise Masking (IVM) and Multi-Endogenous Views (MEV) generation modules. IVM adaptively guides the model in learning more discriminative representations for clustering, while the reconstruction and cluster-guided contrastive learning pathways enhance and connect the representation learning to clustering tasks. Extensive experiments on 15 benchmark datasets demonstrate the superiority of EMTC over eight SOTA methods, where the EMTC achieves an average improvement of 4.85% in F1-Score over the strongest baselines.
翻译:多元时间序列聚类旨在发现时序数据样本的内在分组模式。尽管时间序列提供了丰富的判别性信息,但也包含大量冗余,例如稳态机器运行记录和太阳能发电的零输出时段。此类冗余会削弱表示学习中对判别性时间戳的关注,从而导致多元时间序列聚类的性能瓶颈。掩码技术已被广泛用于增强多元时间序列表示,其中通过设计时序重构任务来捕获多元时间序列的关键信息。然而,现有掩码策略大多作为独立预处理步骤,与学习过程隔离,这阻碍了其根据聚类关键时间戳的重要性进行动态适应。为此,本文提出演化掩码多元时间序列聚类方法,其模型架构包含重要性感知变量级掩码模块和多内生视图生成模块。重要性感知变量级掩码模块自适应地引导模型学习更具判别性的聚类表示,而重构路径与聚类引导对比学习路径则增强并将表示学习与聚类任务相连接。在15个基准数据集上的大量实验表明,该方法优于八种先进方法,其中F1分数较最强基线平均提升4.85%。