Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic segmentation tasks and across various imaging modalities. However, due to the limited kernel size of convolution layers in FCNNs, their performance of modeling long-range information is sub-optimal, and this can lead to deficiencies in the segmentation of tumors with variable sizes. On the other hand, transformer models have demonstrated excellent capabilities in capturing such long-range information in multiple domains, including natural language processing and computer vision. Inspired by the success of vision transformers and their variants, we propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is reformulated as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. The swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention and is connected to an FCNN-based decoder at each resolution via skip connections. We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase. Code: https://monai.io/research/swin-unetr
翻译:脑肿瘤的语义分解是一项基本的医学图像分析任务,涉及多种磁共振成像模式,可以帮助临床医生诊断病人,并连续研究恶性实体的进化。近年来,全演神经网络(FCNN)方法已成为3D医学图像分解的事实上的标准。流行的“U形”网络架构在不同2D和3D语义分解任务和各种成像模式上达到了最先进的性能基准。然而,由于FCNNs的相异层骨架规模有限,他们模拟长程信息的性能是次最佳的,这可能导致肿瘤分解与变异大小的缺陷。另一方面,变压器模型显示在获取包括自然语言处理和计算机视力变异器在内的多域的远程信息方面拥有极好的能力。我们提议了一个名为Swin UNCN 20-TRantersion 的新型分解模型(Swin UN-DTRTR), 其模拟长程信息的性能性能的性能性能性能性能, 在SD- dirmal 的变序中,通过S-demodeal IM IM 数据解到S-deal dal 的系统, 数据解到Sild-dealdal-deal-deal daldaldal daldal dal divaldaldal disaldaldal daldaldaldaldal 。