Large-scale supervised pretraining is rapidly reshaping 3D medical image segmentation. However, existing efforts focus primarily on increasing dataset size and overlook the question of whether the backbone network is an effective representation learner at scale. In this work, we address this gap by revisiting ConvNeXt-based architectures for volumetric segmentation and introducing MedNeXt-v2, a compound-scaled 3D ConvNeXt that leverages improved micro-architecture and data scaling to deliver state-of-the-art performance. First, we show that routinely used backbones in large-scale pretraining pipelines are often suboptimal. Subsequently, we use comprehensive backbone benchmarking prior to scaling and demonstrate that stronger from scratch performance reliably predicts stronger downstream performance after pretraining. Guided by these findings, we incorporate a 3D Global Response Normalization module and use depth, width, and context scaling to improve our architecture for effective representation learning. We pretrain MedNeXt-v2 on 18k CT volumes and demonstrate state-of-the-art performance when fine-tuning across six challenging CT and MR benchmarks (144 structures), showing consistent gains over seven publicly released pretrained models. Beyond improvements, our benchmarking of these models also reveals that stronger backbones yield better results on similar data, representation scaling disproportionately benefits pathological segmentation, and that modality-specific pretraining offers negligible benefit once full finetuning is applied. In conclusion, our results establish MedNeXt-v2 as a strong backbone for large-scale supervised representation learning in 3D Medical Image Segmentation. Our code and pretrained models are made available with the official nnUNet repository at: https://www.github.com/MIC-DKFZ/nnUNet
翻译:大规模监督预训练正在快速重塑三维医学图像分割领域。然而,现有研究主要集中于扩大数据集规模,却忽视了骨干网络在大规模场景下是否具备高效表征学习能力这一关键问题。本研究通过重新审视基于ConvNeXt的体数据分割架构,引入MedNeXt-v2——一种采用复合缩放策略的三维ConvNeXt模型,该模型通过改进微观架构与数据缩放机制实现了最先进的性能表现。首先,我们证明大规模预训练流程中常规使用的骨干网络往往并非最优选择。随后,我们在模型扩展前进行了系统的骨干网络基准测试,发现更强的初始训练性能能够可靠预测预训练后更强的下游任务性能。基于这些发现,我们引入三维全局响应归一化模块,并采用深度、宽度和上下文缩放策略来优化架构设计以提升表征学习效率。我们在18,000个CT体数据上对MedNeXt-v2进行预训练,通过在六个具有挑战性的CT与MR基准数据集(涵盖144个解剖结构)上的微调实验证明了其最先进的性能,相较于七个已公开的预训练模型均取得稳定提升。除性能改进外,我们对这些模型的基准测试还揭示出:更强骨干网络在相似数据上能获得更好效果,表征缩放对病理分割任务具有超比例增益,而在实施完整微调后,模态特异性预训练带来的收益可忽略不计。综上所述,我们的研究确立了MedNeXt-v2作为三维医学图像分割领域大规模监督表征学习的强大骨干网络。代码与预训练模型已发布于官方nnUNet代码库:https://www.github.com/MIC-DKFZ/nnUNet