Multi-view clustering methods are essential for the stratification of patients into sub-groups of similar molecular characteristics. In recent years, a wide range of methods has been developed for this purpose. However, due to the high diversity of cancer-related data, a single method may not perform sufficiently well in all cases. We present Parea, a multi-view hierarchical ensemble clustering approach for disease subtype discovery. We demonstrate its performance on several machine learning benchmark datasets. We apply and validate our methodology on real-world multi-view cancer patient data. Parea outperforms the current state-of-the-art on six out of seven analysed cancer types. We have integrated the Parea method into our developed Python package Pyrea (https://github.com/mdbloice/Pyrea), which enables the effortless and flexible design of ensemble workflows while incorporating a wide range of fusion and clustering algorithms.
翻译:多视角组群方法对于将患者分成类似分子特征的子组至关重要。近年来,为此目的开发了多种方法。然而,由于癌症相关数据的多样性,单一方法可能无法在所有情况下都充分发挥作用。我们介绍了多视角类集法,即疾病亚型发现多视角类集方法。我们在若干机器学习基准数据集中展示了该方法的性能。我们应用并验证了我们关于现实世界多视角癌症患者数据的方法。在7种分析癌症类型中,有6种类型比目前最先进的方法要好。我们已经将区域方法纳入我们开发的Python Pyrea包(https://github.com/mdbloice/Pyrea),该包有助于在采用广泛的聚合和组合算法的同时,不费力和灵活地设计多功能工作流程。