In epidemiological and clinical studies, identifying patients' phenotypes based on longitudinal profiles is critical to understanding the disease's developmental patterns. The current study was motivated by data from a Canadian birth cohort study, the CHILD Cohort Study. Our goal was to use multiple longitudinal respiratory traits to cluster the participants into subgroups with similar longitudinal respiratory profiles in order to identify clinically relevant disease phenotypes. To appropriately account for distinct structures and types of these longitudinal markers, we proposed a novel joint model for clustering mixed-type (continuous, discrete and categorical) multivariate longitudinal data. We also developed a Markov Chain Monte Carlo algorithm to estimate the posterior distribution of model parameters. Analysis of the CHILD Cohort data and simulated data were presented and discussed. Our study demonstrated that the proposed model serves as a useful analytical tool for clustering multivariate mixed-type longitudinal data. We developed an R package BCClong to implement the proposed model efficiently.
翻译:在流行病学和临床研究中,基于长期的病人生物样本数据分析病程模式对了解疾病的发展趋势至关重要。本研究针对加拿大的CHILD Cohort研究的数据,旨在利用多个呼吸指标对参与者进行聚类,以识别具有相似长期呼吸特征的亚组及其相关的临床表现。为了充分考虑各指标之间的差异和特征,我们提出了一种新的混合型(连续型,离散型和类别型)多元长期数据的联合建模方法。我们还开发了马尔科夫链蒙特卡洛算法来估计模型参数的后验分布。本研究还通过应用于CHILD Cohort数据和模拟数据进行了分析和讨论。结果表明,所提出的建模方法是处理混合型多元长期数据聚类的一种有效工具,我们已经开发了一个R软件包BCClong来实现此模型。