预测微型卫星的不稳定性和从H&E所摄图像中直肠癌的关键生物标志:利用Swin变形器以较少的数据实现SOTA预测性性能 (Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: Achieving SOTA predictive performance with fewer data using Swin Transformer)

cancer · Performer · SOTA · Swin Transformer · state-of-the-art ·

2022 年 9 月 12 日

Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: Achieving SOTA predictive performance with fewer data using Swin Transformer

翻译：预测微型卫星的不稳定性和从H&E所摄图像中直肠癌的关键生物标志:利用Swin变形器以较少的数据实现SOTA预测性性能

Bangwei Guo,Xingyu Li,Jitendra Jonnagaddala,Hong Zhang,Xu Steven Xu

Artificial intelligence (AI) models have been developed for predicting clinically relevant biomarkers, including microsatellite instability (MSI), for colorectal cancers (CRC). However, the current deep-learning networks are data-hungry and require large training datasets, which are often lacking in the medical domain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin-T), we developed an efficient workflow for biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, BRAF, and TP53 mutation) that only required relatively small datasets, but achieved the state-of-the-art (SOTA) predictive performance. Our Swin-T workflow not only substantially outperformed published models in an intra-study cross-validation experiment using TCGA-CRC-DX dataset (N = 462), but also showed excellent generalizability in cross-study external validation and delivered a SOTA AUROC of 0.90 for MSI using the MCO dataset for training (N = 1065) and the same TCGA-CRC-DX for testing. Similar performance (AUROC=0.91) was achieved by Echle and colleagues using approximately 8000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient using small training datasets and exhibits robust predictive performance with only 200-500 training samples. These data indicate that Swin-T may be 5-10 times more efficient than the current state-of-the-art algorithms for MSI based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models showed promise as pre-screening tests for MSI status and BRAF mutation status, which could exclude and reduce the samples before the subsequent standard testing in a cascading diagnostic workflow to allow turnaround time reduction and cost saving.

翻译：人工智能(AI)模型已经开发出来,用于预测临床相关生物标志,包括用于直肠癌(CRC)的微型卫星不稳定性(MSI),但是,目前的深层学习网络只是数据饥饿,需要大量培训数据集,而医疗领域往往缺乏这些数据。在这项研究中,我们根据使用移动视窗(Swin-T)的最新高层次视觉变异器开发了一个高效的CRC生物标志流程(MSI、超音化、染色体不稳定、CpG岛屿变异性马氏型、BRAF和TP53突变),只需要相对小的数据集,而只需要相对小的样本,而需要达到最先进的培训数据集(SWin-T)预测性能。使用Swin-T的S-VERS-ROD模型进行SGA-S-ROD 快速测试时,SRCS-S-ROD 数据测试时,SRCS-RODS-S-SL 数据测试时,SLS-RODS-CS-CSL 的SLA-T 测试中,这些S-CRVA-TA-CSLADSLA-SLA-SLA 测试中,这些测试中S-S-SLT-T-SLA-T-T-SLTSLT-SLSLADSLSLA-SLADSDSLA-T-T-T-T-S-T-T-T-T-T-T-T-T-SDSDSLT-T-SLAT-SDSDSDSDSDSDSDSD 的测试中可能显示了10 和T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-C-T-T-T-SLT-T-SLTATATA-SLTA-TA-TA-TAD-T-TA-TA-TA-T-T-T-TA-TA-TA-TA-T-T-T-T-T-T-T-T-T-SLSL