When oncologists estimate cancer patient survival, they rely on multimodal data. Even though some multimodal deep learning methods have been proposed in the literature, the majority rely on having two or more independent networks that share knowledge at a later stage in the overall model. On the other hand, oncologists do not do this in their analysis but rather fuse the information in their brain from multiple sources such as medical images and patient history. This work proposes a deep learning method that mimics oncologists' analytical behavior when quantifying cancer and estimating patient survival. We propose TMSS, an end-to-end Transformer based Multimodal network for Segmentation and Survival prediction that leverages the superiority of transformers that lies in their abilities to handle different modalities. The model was trained and validated for segmentation and prognosis tasks on the training dataset from the HEad & NeCK TumOR segmentation and the outcome prediction in PET/CT images challenge (HECKTOR). We show that the proposed prognostic model significantly outperforms state-of-the-art methods with a concordance index of 0.763+/-0.14 while achieving a comparable dice score of 0.772+/-0.030 to a standalone segmentation model. The code is publicly available.
翻译:当肿瘤学家估计癌症患者存活率时,他们依赖多式联运数据。即使文献中提出了一些多式深层次学习方法,但大多数人依赖两个或两个以上独立网络,在总体模型的后期阶段分享知识。另一方面,肿瘤学家在分析中不这样做,而是将大脑中的信息从医学图像和病人历史等多种来源融合起来。这项工作提出了一种深层次的学习方法,在量化癌症和估计病人存活率时模仿肿瘤学家的分析行为。我们提议TMSS,即基于端到端变异器的分解和生存预测多式网络,利用变异器的优势,这些优势在于他们处理不同模式的能力。该模型经过培训和验证,用于从HEAD & NecK TumOR分解到PET/CT图像挑战的结果预测(HECKTOR)中的培训数据集的分解和预测任务。我们发现,拟议的预测模型大大超越了以0.763+/0.14的一致度模型/0.272的状态-艺术方法,同时实现可比较的模型/0.30分数制成。