Transformer, as a new generation of neural architecture, has demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn with limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present MedFormer as a data-scalable Transformer towards generalizable medical image segmentation. The key designs incorporate desirable inductive bias, hierarchical modeling with linear-complexity attention, and multi-scale feature fusion in a spatially and semantically global manner. MedFormer can learn across tiny- to large-scale data without pre-training. Extensive experiments demonstrate the potential of MedFormer as a general segmentation backbone, outperforming CNNs and vision Transformers on three public datasets with multiple modalities (e.g., CT and MRI) and diverse medical targets (e.g., healthy organ, diseased tissue, and tumor). We make the models and evaluation pipeline publicly available, offering solid baselines and unbiased comparisons for promoting a wide range of downstream clinical applications.
翻译:作为新一代神经结构的变异器,在自然语言处理和计算机视觉方面表现出了显著的成绩;然而,现有的视觉变异器在医学数据有限的情况下努力学习,无法概括不同的医学形象任务;为了应对这些挑战,我们把MedFormer作为一个数据可缩放的变异器,介绍为通俗医学图像分割而设计的数据缩放变异器;关键设计包含可取的感应偏差、具有线性复杂关注的分级模型以及以空间和语义方式的多尺度特征融合;MedFormer可以不经过培训就从小到大范围的数据学习;广泛的实验表明MedFormer作为一般分割主干线、超过有线电视新闻网和视觉变异器在三种具有多种模式(例如CT和MRI)的公共数据集和多种医疗目标(例如健康器官、疾病组织和肿瘤)上的潜力;我们公开提供模型和评价管道,提供可靠的基线和公正比较,以促进广泛的下游临床应用。