Clustering time-series data in healthcare is crucial for clinical phenotyping to understand patients' disease progression patterns and to design treatment guidelines tailored to homogeneous patient subgroups. While rich temporal dynamics enable the discovery of potential clusters beyond static correlations, two major challenges remain outstanding: i) discovery of predictive patterns from many potential temporal correlations in the multi-variate time-series data and ii) association of individual temporal patterns to the target label distribution that best characterizes the underlying clinical progression. To address such challenges, we develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data. We introduce an efficient representation learning approach in frequency domain that can encode variable-length, irregularly-sampled time-series into a unified representation space, which is then applied to identify various temporal patterns that potentially contribute to the target label using a new notion of path-based similarity. Throughout the experiments on synthetic and real-world datasets, we show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines. We further demonstrate the utility of T-Phenotype by uncovering clinically meaningful patient subgroups characterized by unique temporal patterns.
翻译:医疗领域的时间序列数据分组对于临床观察了解病人疾病进展模式和设计适合同质病人分组的治疗准则至关重要。虽然丰富的时间动态使得能够发现静态关联之外的潜在集群,但仍存在两大挑战:(一) 发现多变时间序列数据中许多潜在时间相关性的预测模式,并(二) 将个别时间模式与最适合基础临床进展特征的目标标签分布联系起来。为了应对这些挑战,我们开发了一种新的时间集群方法T-Pheno类型,以从有标签的时间序列数据中发现预测时间模式的苯型。我们在频率域引入高效的代表学习方法,可以将变长、不规则的时间序列编码成一个统一的代表空间,然后用于利用基于路径的相似性新概念确定可能有助于目标标签的各种时间模式。在合成和现实世界数据集的试验中,我们显示T-Pheno类型在所有评估的时间序列数据中都取得了最佳的苯型发现性功能。我们通过评估的临床数据库类型进一步展示了创新的Tsent-phentyty模式。</s>