Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pre-training; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The effectiveness of our approach is validated by fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV) Segmentation Challenge with 13 abdominal organs and segmentation tasks from the Medical Segmentation Decathlon (MSD) dataset. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets. Code: https://monai.io/research/swin-unetr
翻译:视觉变异器(VIT)在自我监督地学习可以转移到下游应用的全球和地方代表的自我监督学习方面表现良好。受这些结果的启发,我们引入了一个新的自我监督学习框架,为医学图像分析量身定制了替代任务。具体地说,我们提议:(一) 一个新的3D变异器模型,名为Swin UNET TREXERS(Swin UNETTR),配有自监督培训前自我监督的高级编码器(Swin UNETTR));(二) 为学习人类解剖基本模式量身定制的代用任务。我们展示了对5 050个拟议模型的预先培训成功。我们的方法的有效性是通过对Cranal Vault (BTCV) 之后预先培训的模型进行微调,由13个主控器官构成的分解挑战,以及医学分解 Decathlon (MSD) 数据集的分解任务。我们的模式是目前公共测试平台MSDSD/MSDSD的状态(i.st) 和MSDSD/MSED/MSED/MSBSBSBSBSBSet/MCSBSBSBSBSBSBSOD) 和MDSBSBSBSBSBSBSBSBSBSDSB的分板的分。 SS/MDSD/MDSD/MSDSOD.