Molecular phenotyping by gene expression profiling is common in contemporary cancer research and in molecular diagnostics. However, molecular profiling remains costly and resource intense to implement, and is just starting to be introduced into clinical diagnostics. Molecular changes, including genetic alterations and gene expression changes, occuring in tumors cause morphological changes in tissue, which can be observed on the microscopic level. The relationship between morphological patterns and some of the molecular phenotypes can be exploited to predict molecular phenotypes directly from routine haematoxylin and eosin (H&E) stained whole slide images (WSIs) using deep convolutional neural networks (CNNs). In this study, we propose a new, computationally efficient approach for disease specific modelling of relationships between morphology and gene expression, and we conducted the first transcriptome-wide analysis in prostate cancer, using CNNs to predict bulk RNA-sequencing estimates from WSIs of H&E stained tissue. The work is based on the TCGA PRAD study and includes both WSIs and RNA-seq data for 370 patients. Out of 15586 protein coding and sufficiently frequently expressed transcripts, 6618 had predicted expression significantly associated with RNA-seq estimates (FDR-adjusted p-value < 1*10-4) in a cross-validation. 5419 (81.9%) of these were subsequently validated in a held-out test set. We also demonstrate the ability to predict a prostate cancer specific cell cycle progression score directly from WSIs. These findings suggest that contemporary computer vision models offer an inexpensive and scalable solution for prediction of gene expression phenotypes directly from WSIs, providing opportunity for cost-effective large-scale research studies and molecular diagnostics.
翻译:以基因表达方式剖析的分子运动在现代癌症研究和分子诊断中很常见。然而,分子剖析仍然费用昂贵,资源密集,需要投入执行,而且才刚刚开始引入临床诊断。分子变化,包括基因改变和基因表达变化,在肿瘤中出现,造成组织组织形态变化,可以在微观观察层面观察到。形态形态模式和某些分子细胞类型之间的关系可以被利用,直接从常规血清催产素和血氧(H&E)直接从常规血清催产素和eosin(H&E)覆盖整个幻灯片图象(SSIs),使用深层凝固神经网络(CNNs)进行。在本研究中,我们提出了一个新的、具有计算效率的方法,用于对组织形态学和基因表达之间的关系进行具体的疾病建模,我们利用CNIS来预测大规模RNA模型对H&E诊断性组织进行排序的估算。 这项工作以TCGA PRAD(H) 覆盖整个幻灯片图象(SRADA) 和 RNAADRDR(O) 直接显示的直译结果,这些S-R819 预估值 和直测结果, 直测结果显示的RA-CRA-RIS-CRDRDR-CR) 的直序 和直判 直判 直判 直判 和直判 直估值 直判 提供了这些结果 。