There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation.
翻译:近年来,基于Transformer的架构在医学图像分割中受到了广泛的关注。然而,缺乏大规模标记医学数据集使得达到自然图像中等效的性能变得具有挑战性。相对而言,卷积网络具有更高的归纳偏差,因此容易训练出高性能的模型。最近,ConvNeXt架构试图通过模仿Transformer块来现代化标准的卷积神经网络。在本研究中,我们改进了ConvNeXt以设计出一种可扩展的卷积神经网络,以适应数据稀缺的医学环境的挑战。我们引入了MedNeXt,这是一种基于Transformer灵感的大卷积核分割网络,它引入了1)用于医学图像分割的完全ConvNeXt 3D编码器-解码器网络,2)剩余的ConvNeXt上下采样块以在多个尺度上保持语义丰富性,3)通过将小卷积网络上采样来迭代增加内核大小的新技术,以防止有限的医学数据上的性能饱和,4)MedNeXt多个级别(深度、宽度、内核大小)上的复合比例尺。这导致在CT和MRI模态和不同的数据集大小上实现了最先进的性能,代表了一种现代化的医学图像分割的深度架构。