The discovery of disease subtypes is an essential step for developing precision medicine, and disease subtyping via omics data has become a popular approach. While promising, subtypes obtained from current approaches are not necessarily associated with clinical outcomes. With the rich clinical data along with the omics data in modern epidemiology cohorts, it is urgent to develop an outcome-guided clustering algorithm to fully integrate the phenotypic data with the high-dimensional omics data. Hence, we extended a sparse K-means method to an outcome-guided sparse K-means (GuidedSparseKmeans) method, which incorporated a phenotypic variable from the clinical dataset to guide gene selections from the high-dimensional omics data. We demonstrated the superior performance of the GuidedSparseKmeans by comparing with existing clustering methods in simulations and applications of high-dimensional transcriptomic data of breast cancer and Alzheimer's disease. Our algorithm has been implemented into an R package, which is publicly available on GitHub (https://github.com/LingsongMeng/GuidedSparseKmeans).
翻译:疾病亚型的发现是发展精密医学的一个必要步骤,而通过食谱数据进行疾病亚型的发现已成为一种受欢迎的方法。虽然从目前方法中获得的亚型有希望,但从目前方法中获得的亚型不一定与临床结果相关。随着丰富的临床数据以及现代流行病学组群的食谱数据,我们迫切需要开发一种结果引导群算法,以充分将口腔数据与高维的食谱数据数据结合起来。因此,我们将一种稀疏的K手段方法推广到一种结果引导稀疏的K手段(GuidedSparseKmenes)方法,该方法从临床数据集中吸收了一种胎儿变量,用以指导从高维食谱数据中选择基因。我们通过在乳癌和阿尔茨海默氏病高维谱数据模拟和应用中与现有群集方法进行比较,展示了“方向”方法的优异性表现。我们的算法已经应用于一个R包,在GitHub(https://github.com/LingsongMime/GuidKaples)上公开提供。