Digital pathological analysis is run as the main examination used for cancer diagnosis. Recently, deep learning-driven feature extraction from pathology images is able to detect genetic variations and tumor environment, but few studies focus on differential gene expression in tumor cells. In this paper, we propose a self-supervised contrastive learning framework, HistCode, to infer differential gene expressions from whole slide images (WSIs). We leveraged contrastive learning on large-scale unannotated WSIs to derive slide-level histopathological feature in latent space, and then transfer it to tumor diagnosis and prediction of differentially expressed cancer driver genes. Our extensive experiments showed that our method outperformed other state-of-the-art models in tumor diagnosis tasks, and also effectively predicted differential gene expressions. Interestingly, we found the higher fold-changed genes can be more precisely predicted. To intuitively illustrate the ability to extract informative features from pathological images, we spatially visualized the WSIs colored by the attentive scores of image tiles. We found that the tumor and necrosis areas were highly consistent with the annotations of experienced pathologists. Moreover, the spatial heatmap generated by lymphocyte-specific gene expression patterns was also consistent with the manually labeled WSI.
翻译:数字病理分析是用于癌症诊断的主要检查。最近,从病理学图像中深层次学习驱动特征的提取能够检测遗传变异和肿瘤环境,但很少的研究侧重于肿瘤细胞中的基因表现。在本文中,我们提议了一个自我监督的对比性学习框架HistCode,从整个幻灯片图像中推断出不同的基因表达方式。我们利用大规模无注释的WSI的对比性学习,在潜层空间中产生幻灯片水平的病理学特征,然后将其转移到肿瘤诊断和不同表达的癌症驱动基因的预测中。我们的广泛实验显示,我们的方法优于肿瘤诊断任务中的其他最先进的模型,并且也有效地预测了不同的基因表达方式。有趣的是,我们发现较高的折叠基因可以更准确地预测。我们通过直观地展示从病理学图像中提取信息特征的能力,我们从空间上对图像图案的分数进行着色的WSI进行视觉分析。我们发现,肿瘤和肾脏区域与有经验的病理学家的描述非常具体。此外,空间热谱也以手动模型形式生成的图像图示。