优化利用无人监督的群集预测基因变异的深学习模型 (Optimize Deep Learning Models for Prediction of Gene Mutations Using Unsupervised Clustering)

Deep learning has become the mainstream methodological choice for analyzing and interpreting whole-slide digital pathology images (WSIs). It is commonly assumed that tumor regions carry most predictive information. In this paper, we proposed an unsupervised clustering-based multiple-instance learning, and apply our method to develop deep-learning models for prediction of gene mutations using WSIs from three cancer types in The Cancer Genome Atlas (TCGA) studies (CRC, LUAD, and HNSCC). We showed that unsupervised clustering of image patches could help identify predictive patches, exclude patches lack of predictive information, and therefore improve prediction on gene mutations in all three different cancer types, compared with the WSI based method without selection of image patches and models based on only tumor regions. Additionally, our proposed algorithm outperformed two recently published baseline algorithms leveraging unsupervised clustering to assist model prediction. The unsupervised-clustering-based approach for mutation prediction allows identification of the spatial regions related to mutation of a specific gene via the resolved probability scores, highlighting the heterogeneity of a predicted genotype in the tumor microenvironment. Finally, our study also demonstrated that selection of tumor regions of WSIs is not always the best way to identify patches for prediction of gene mutations, and other tissue types in the tumor micro-environment may provide better prediction ability for gene mutations than tumor tissues.

翻译：深层学习已成为分析和解释全流数字病理学图像的主流方法选择。人们通常认为肿瘤区域含有最可预测的信息。在本文中,我们建议采用一种不受监督的集群基础多因子学习方法,并运用我们的方法,利用癌症基因组图集(TCGA)研究(CRC、LUAD和HNSCC)中三种癌症类型 WSI预测基因突变。我们发现,未经监督的图像补丁组合有助于识别预测补丁,排除补丁缺乏预测信息,从而改进所有三种不同癌症类型基因突变的预测。与基于WSI的方法相比,我们建议采用的方法开发深层次学习模型模型模型模型模型模型,使用三种癌症类型(TCGA)研究(CRC、LUAD和HNSCC)中的三种癌症变异异基因。我们提议的算法超越了最近出版的两种基线算法,利用未经监督的组群集来帮助模型预测。以不超过集群的变异性预测方法使得能够通过解的概率分计确定特定基因的空间区域,突出所有三种癌症类型的基因变异性,同时突出显示我们所预测的基因型肿瘤的基因结构结构型型型类的基因类型的能力,最后也证明了我们所显示的基因的基因型型型型型的基因学的基因学的基因型型型类的基因学的基因学的预测方法。