通过可解释的多模式深层学习进行泛癌症综合地质学-基因分析 (Pan-Cancer Integrative Histology-Genomic Analysis via Interpretable Multimodal Deep Learning)

Richard J. Chen,Ming Y. Lu,Drew F. K. Williamson,Tiffany Y. Chen,Jana Lipkova,Muhammad Shaban,Maha Shady,Mane Williams,Bumjin Joo,Zahra Noor,Faisal Mahmood

from arxiv, Demo: http://pancancer.mahmoodlab.org

The rapidly emerging field of deep learning-based computational pathology has demonstrated promise in developing objective prognostic models from histology whole slide images. However, most prognostic models are either based on histology or genomics alone and do not address how histology and genomics can be integrated to develop joint image-omic prognostic models. Additionally identifying explainable morphological and molecular descriptors from these models that govern such prognosis is of interest. We used multimodal deep learning to integrate gigapixel whole slide pathology images, RNA-seq abundance, copy number variation, and mutation data from 5,720 patients across 14 major cancer types. Our interpretable, weakly-supervised, multimodal deep learning algorithm is able to fuse these heterogeneous modalities for predicting outcomes and discover prognostic features from these modalities that corroborate with poor and favorable outcomes via multimodal interpretability. We compared our model with unimodal deep learning models trained on histology slides and molecular profiles alone, and demonstrate performance increase in risk stratification on 9 out of 14 cancers. In addition, we analyze morphologic and molecular markers responsible for prognostic predictions across all cancer types. All analyzed data, including morphological and molecular correlates of patient prognosis across the 14 cancer types at a disease and patient level are presented in an interactive open-access database (http://pancancer.mahmoodlab.org) to allow for further exploration and prognostic biomarker discovery. To validate that these model explanations are prognostic, we further analyzed high attention morphological regions in WSIs, which indicates that tumor-infiltrating lymphocyte presence corroborates with favorable cancer prognosis on 9 out of 14 cancer types studied.

翻译：快速出现的深层次学习计算病理学领域表明,在从组织学整体幻灯片图象中开发客观的预测模型方面,快速出现的深层计算病理学领域显示了希望;然而,大多数预测模型要么仅仅基于组织学或基因组学,而没有涉及如何将组织学和基因组学结合起来,以开发共同的图像学-工程学预测模型;此外,从这些模型中找出可解释的形态学和分子描述器,以管理这种诊断。我们使用了多式深度学习,以整合来自组织学整个幻灯片图象、RNA-seq的丰度、复制数字变异和来自14种主要癌症类型5 720名患者的突变数据。我们可解释的、薄弱的、多式深层学习算算法能够结合这些复杂的方式,以预测结果并发现这些模式中的不良和有利的结果。我们用模型和不完美的深度学习模型来进行对比,仅用于进行关于直系病理学幻灯片和分子剖析的模型,并显示在14种癌症的分子种癌症中进行风险分析的特性分析,包括所有负责任的模型分析。