STU-Net: 基于大规模监督预训练的可扩展和可迁移医学图像分割模型 (STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training)

Large-scale models pre-trained on large-scale datasets have profoundly advanced the development of deep learning. However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions. Further scaling them up to higher orders of magnitude is rarely explored. An overarching goal of exploring large-scale models is to train them on large-scale medical segmentation datasets for better transfer capacities. In this work, we design a series of Scalable and Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14 million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image segmentation model to date. Our STU-Net is based on nnU-Net framework due to its popularity and impressive performance. We first refine the default convolutional blocks in nnU-Net to make them scalable. Then, we empirically evaluate different scaling combinations of network depth and width, discovering that it is optimal to scale model depth and width together. We train our scalable STU-Net models on a large-scale TotalSegmentator dataset and find that increasing model size brings a stronger performance gain. This observation reveals that a large model is promising in medical image segmentation. Furthermore, we evaluate the transferability of our model on 14 downstream datasets for direct inference and 3 datasets for further fine-tuning, covering various modalities and segmentation targets. We observe good performance of our pre-trained model in both direct inference and fine-tuning. The code and pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net.

翻译：大规模预训练模型已经深刻推进了深度学习的发展。然而，当前医学图像分割领域的最先进模型仍然只拥有数千万的参数。如何进一步将这些模型扩展到更高的参数数量仍然鲜有研究。探索大规模模型的核心目标是训练它们以获得更好的迁移能力。在这项研究中，我们设计了一系列可扩展和可迁移的 U-Net（STU-Net）模型，参数数量从 1,400 万到 14 亿不等。值得注意的是，1.4B STU-Net 是迄今为止最大的医学图像分割模型。我们的 STU-Net 是基于 nnU-Net 框架构建的，因为该框架使用广泛且性能出色。我们首先优化 nnU-Net 中的默认卷积块，使其具有可扩展性。然后，我们对网络深度和宽度进行不同的扩展组合进行了实证评估，发现将模型深度和宽度同时扩展最优。我们在大规模 TotalSegmentator 数据集上训练了可扩展的 STU-Net 模型，并发现增加模型大小会带来更强的性能提升。这一观察结果揭示了大模型在医学图像分割中的潜力。此外，我们评估了我们的模型在 14 个下游数据集上进行直接推理和 3 个数据集上进行微调的迁移性能，涵盖了各种模态和分割目标。我们发现我们的预训练模型在直接推理和微调方面均具有良好的性能。代码和预训练模型可以在 https://github.com/Ziyan-Huang/STU-Net 上找到。