Given the prevalence of 3D medical imaging technologies such as MRI and CT that are widely used in diagnosing and treating diverse diseases, 3D segmentation is one of the fundamental tasks of medical image analysis. Recently, Transformer-based models have started to achieve state-of-the-art performances across many vision tasks, through pre-training on large-scale natural image benchmark datasets. While works on medical image analysis have also begun to explore Transformer-based models, there is currently no optimal strategy to effectively leverage pre-trained Transformers, primarily due to the difference in dimensionality between 2D natural images and 3D medical images. Existing solutions either split 3D images into 2D slices and predict each slice independently, thereby losing crucial depth-wise information, or modify the Transformer architecture to support 3D inputs without leveraging pre-trained weights. In this work, we use a simple yet effective weight inflation strategy to adapt pre-trained Transformers from 2D to 3D, retaining the benefit of both transfer learning and depth information. We further investigate the effectiveness of transfer from different pre-training sources and objectives. Our approach achieves state-of-the-art performances across a broad range of 3D medical image datasets, and can become a standard strategy easily utilized by all work on Transformer-based models for 3D medical images, to maximize performance.
翻译:鉴于广泛用于诊断和治疗各种疾病的3D医学成像技术,如MRI和CT等3D医学成像技术的流行,3D分化是医学成像分析的基本任务之一。最近,基于变异器的模型开始通过大规模自然图像基准数据集的预先培训,在许多愿景任务中实现最先进的表现。虽然医学成像分析工作也已开始探索基于变异器的模型,但目前没有最佳战略来有效地利用预先培训的变异器,这主要是因为2D自然成像和3D医学成像在维度上的差异。现有的解决方案要么将3D图像分成2D切片,独立预测每一切片,从而失去至关重要的深度信息,要么修改变异器结构以支持3D投入,而不利用预先培训的重量数据集。在这项工作中,我们采用简单而有效的重力膨胀战略,将受过训练的变异器从2D改成3D,保留转移学习和深度信息的好处。我们进一步调查从不同的培训前来源和医疗成像的转移的效果,并独立地预测每一切片,从而失去关键的深度信息。我们的方法可以轻松地实现3D的全局性业绩模型的全局,我们的方法可以轻易地利用所有基于3D的全局的全局性成一个基于医学的全局的全局的全局性图象模型。