以全变压器为基础的从直肠直肠癌肿瘤学预测的生物标志:一项大型多中心研究 (Fully transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study)

Sophia J. Wagner,Daniel Reisenbüchler,Nicholas P. West,Jan Moritz Niehues,Gregory Patrick Veldhuizen,Philip Quirke,Heike I. Grabsch,Piet A. van den Brandt,Gordon G. A. Hutchins,Susan D. Richman,Tanwei Yuan,Rupert Langer,Josien Christina Anna Jenniskens,Kelly Offermans,Wolfram Mueller,Richard Gray,Stephen B. Gruber,Joel K. Greenson,Gad Rennert,Joseph D. Bonner,Daniel Schmolze,Jacqueline A. James,Maurice B. Loughrey,Manuel Salto-Tellez,Hermann Brenner,Michael Hoffmeister,Daniel Truhn,Julia A. Schnabel,Melanie Boxberg,Tingying Peng,Jakob Nikolas Kather

Background: Deep learning (DL) can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. For example, a DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications, but have not been used for biomarker prediction in cancer at a large scale. In addition, most DL approaches have been trained on small patient cohorts, which limits their clinical utility. Methods: In this study, we developed a new fully transformer-based pipeline for end-to-end biomarker prediction from pathology slides. We combine a pre-trained transformer encoder and a transformer network for patch aggregation, capable of yielding single and multi-target prediction at patient level. We train our pipeline on over 9,000 patients from 10 colorectal cancer cohorts. Results: A fully transformer-based approach massively improves the performance, generalizability, data efficiency, and interpretability as compared with current state-of-the-art algorithms. After training on a large multicenter cohort, we achieve a sensitivity of 0.97 with a negative predictive value of 0.99 for MSI prediction on surgical resection specimens. We demonstrate for the first time that resection specimen-only training reaches clinical-grade performance on endoscopic biopsy tissue, solving a long-standing diagnostic problem. Interpretation: A fully transformer-based end-to-end pipeline trained on thousands of pathology slides yields clinical-grade performance for biomarker prediction on surgical resections and biopsies. Our new methods are freely available under an open source license.

翻译：深层学习( DL) 可以从直肠癌常规病理学幻灯片中提取预测性和预测性生物标志。例如, 2022年, CRC 批准了用于诊断微型卫星不稳定性(MSI)的DL测试。目前的方法依赖于神经神经网络(CNNs ) 。变压器网络比CNN强, 在许多应用中取代这些网络, 但没有用于大规模癌症生物标志性预测。此外, 大多数DL方法都针对小病人的外科手术组进行了培训, 限制了他们的临床效用。方法 : 在这项研究中, 我们开发了一个新的基于完全变压机的输油管, 用于病理学幻灯片的终端至端生物标志预测性能。我们把一个经过预先训练的变压器和变压器网络结合起来, 能够在病人一级产生单一和多目标的预测。我们从10个直肠癌组中对超过9000名的病人进行了输油管进行免费的测试。结果: 基于完全变压法的方法大大改进了性能、一般性、数据性变压性、和可解释性变压性流流流流的输精性输结果, 与目前状态的预算的高级预算结果, 实现一个长期性变压性变压性能, 进行我们进行一个用于的预算。