Self-supervised learning approaches provide a promising direction for clustering multivariate time-series data. However, real-world time-series data often include missing values, and the existing approaches require imputing missing values before clustering, which may cause extensive computations and noise and result in invalid interpretations. To address these challenges, we present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time). SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data and learning more robust time-series representations. This method jointly learns the neural network parameters and the cluster assignments of the learned representations. It iteratively clusters the learned representations with the K-means method and then utilizes the subsequent cluster assignments as pseudo-labels to update the model parameters. To evaluate our proposed approach, we applied it to clustering and phenotyping Traumatic Brain Injury (TBI) patients in the TRACK-TBI dataset. Our experiments demonstrate that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index. We identified three TBI phenotypes that are distinct from one another in terms of clinically significant variables as well as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE) score, Intensive Care Unit (ICU) length of stay, and mortality rate. The experiments show that the TBI phenotypes identified by SLAC-Time can be potentially used for developing targeted clinical trials and therapeutic strategies.
翻译:自我监督的学习方法为多变时间序列数据的分组提供了有希望的方向。然而,现实世界的时间序列数据往往包括缺失的值,而现有方法要求在分组之前估算缺失的值,这可能导致大量计算和噪音,并导致无效的解释。为了应对这些挑战,我们提出了一个以多变时间序列数据分组的自监督学习方法,其中缺少值(SLAC-Time) 。 SLAC-Time是一种基于变换的集群方法,它使用时间序列预测作为替代任务,以利用未标记的数据和学习更稳健的时间序列表示。这种方法在分组之前联合学习神经网络参数和所学代表的集群任务,这可能导致大量计算方法,然后用后来的分组任务作为假标签来更新模型参数。为了评估我们拟议的方法,我们应用它来分组和粗略脑损伤(TTRACK-TTII)患者的分类,我们的实验表明,SLAC-S-IMER 的深度数据序列比值比值比值比值比值比值比值比值比值比值比值比值比值比值为K-BI的基数,我们发现一个基级的基级的基级货币联盟的基级的基级变数, 基级的基级的基数值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值是用于SLBILBLBLBLBI的比值,用来用于用于用于SLBLBI的基值的基值,用于SLBI的基值的基值的基值的基值计算法,用于另一个的基值的基值的基值。</s>