基于组织学信息的全组织切片分块方法提升癌症复发与遗传变异预测的可解释性与准确性 (Histology-informed tiling of whole tissue sections improves the interpretability and predictability of cancer relapse and genetic alterations)

Willem Bonnaffé,Yang Hu,Andrea Chatrian,Mengran Fan,Stefano Malacrino,Sandy Figiel,CRUK ICGC Prostate Group,Srinivasa R. Rao,Richard Colling,Richard J. Bryant,Freddie C. Hamdy,Dan J. Woodcock,Ian G. Mills,Clare Verrill,Jens Rittscher

from arxiv, 26 pages, 6 figures

Histopathologists establish cancer grade by assessing histological structures, such as glands in prostate cancer. Yet, digital pathology pipelines often rely on grid-based tiling that ignores tissue architecture. This introduces irrelevant information and limits interpretability. We introduce histology-informed tiling (HIT), which uses semantic segmentation to extract glands from whole slide images (WSIs) as biologically meaningful input patches for multiple-instance learning (MIL) and phenotyping. Trained on 137 samples from the ProMPT cohort, HIT achieved a gland-level Dice score of 0.83 +/- 0.17. By extracting 380,000 glands from 760 WSIs across ICGC-C and TCGA-PRAD cohorts, HIT improved MIL models AUCs by 10% for detecting copy number variation (CNVs) in genes related to epithelial-mesenchymal transitions (EMT) and MYC, and revealed 15 gland clusters, several of which were associated with cancer relapse, oncogenic mutations, and high Gleason. Therefore, HIT improved the accuracy and interpretability of MIL predictions, while streamlining computations by focussing on biologically meaningful structures during feature extraction.

翻译：病理学家通过评估组织学结构（如前列腺癌中的腺体）来确定癌症分级。然而，数字病理学流程通常依赖忽略组织结构特征的网格分块方法，这会引入无关信息并限制可解释性。我们提出了一种基于组织学信息的分块方法，该方法利用语义分割从全切片图像中提取腺体，作为多实例学习和表型分析的生物学意义明确的输入图像块。在ProMPT队列的137个样本上进行训练后，该方法在腺体级别的Dice得分达到0.83 +/- 0.17。通过从ICGC-C和TCGA-PRAD队列的760张全切片图像中提取38万个腺体，该方法使多实例学习模型在检测上皮-间质转化相关基因及MYC基因拷贝数变异时的AUC提升了10%，并识别出15个腺体簇，其中多个簇与癌症复发、致癌突变及高Gleason评分相关。因此，该方法通过聚焦特征提取过程中的生物学意义结构，既提升了多实例学习预测的准确性与可解释性，又优化了计算流程。