Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research areas. In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Systems biology is a good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our proposed approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer.
翻译:机械学习方法可以探测变量之间的复杂关系,但通常不会利用领域知识。这是一个局限性,因为在许多科学学科,例如系统生物学,领域知识以图表或网络的形式提供,使用这种知识可以改进模型性能。我们需要基于网络的算法,这种算法需要多种功能,并适用于许多研究领域。在这项工作中,我们展示基于多模式节点特征的子网络探测方法,使用一种具有内在解释性的新型贪婪决定森林。后者将是留住专家并赢得其对此类算法信任的一个关键因素。为了展示具体应用实例,我们注重生物信息学、系统生物学,特别是生物医学,但所提出的方法也适用于许多其他领域。系统生物学是一个很好的例子,在这个领域,以统计数据为驱动的机器学习能够分析大量多模式生物医学数据。这对于实现精确医学的未来目标十分重要,因为在这个领域,病人的复杂性将建模在系统一级,以最适合病人的医学决定、保健做法和治疗方法,使其适合病人个人。我们提出的方法可以帮助从多种形式癌症的数据中发现疾病致癌的网络模块,从而更好地了解复杂的癌症。