Artificial intelligence (AI) models have been developed for predicting clinically relevant biomarkers, including microsatellite instability (MSI), for colorectal cancers (CRC). However, the current deep-learning networks are data-hungry and require large training datasets, which are often lacking in the medical domain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin-T), we developed an efficient workflow for biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, BRAF, and TP53 mutation) that only required relatively small datasets, but achieved the state-of-the-art (SOTA) predictive performance. Our Swin-T workflow not only substantially outperformed published models in an intra-study cross-validation experiment using TCGA-CRC-DX dataset (N = 462), but also showed excellent generalizability in cross-study external validation and delivered a SOTA AUROC of 0.90 for MSI using the MCO dataset for training (N = 1065) and the same TCGA-CRC-DX for testing. Similar performance (AUROC=0.91) was achieved by Echle and colleagues using approximately 8000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient using small training datasets and exhibits robust predictive performance with only 200-500 training samples. These data indicate that Swin-T may be 5-10 times more efficient than the current state-of-the-art algorithms for MSI based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models showed promise as pre-screening tests for MSI status and BRAF mutation status, which could exclude and reduce the samples before the subsequent standard testing in a cascading diagnostic workflow to allow turnaround time reduction and cost saving.
翻译:人工智能(AI)模型已经开发出来,用于预测临床相关生物标志,包括用于直肠癌(CRC)的微型卫星不稳定性(MSI),但是,目前的深层学习网络只是数据饥饿,需要大量培训数据集,而医疗领域往往缺乏这些数据。在这项研究中,我们根据使用移动视窗(Swin-T)的最新高层次视觉变异器开发了一个高效的CRC生物标志流程(MSI、超音化、染色体不稳定、CpG岛屿变异性马氏型、BRAF和TP53突变),只需要相对小的数据集,而只需要相对小的样本,而需要达到最先进的培训数据集(SWin-T)预测性能。 使用Swin-T的S-VERS-ROD模型进行SGA-S-ROD 快速测试时,SRCS-S-ROD 数据测试时,SRCS-RODS-S-SL 数据测试时,SLS-RODS-CS-CSL 的SLA-T 测试中,这些S-CRVA-TA-CSLADSLA-SLA-SLA 测试中,这些测试中S-S-SLT-T-SLA-T-T-SLTSLT-SLSLADSLSLA-SLADSDSLA-T-T-T-T-S-T-T-T-T-T-T-T-T-SDSDSLT-T-SLAT-SDSDSDSDSDSDSDSD 的测试中可能显示了10 和T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-C-T-T-T-SLT-T-SLTATATA-SLTA-TA-TA-TAD-T-TA-TA-TA-T-T-T-TA-TA-TA-TA-T-T-T-T-T-T-T-T-T-SLSL