There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation.
翻译:近来,学者们越来越倾向于采用基于Transformer的架构进行医学图像分割。然而,由于缺少大规模标注的医学图像数据集,要实现与自然图像相同的表现仍然具有挑战性。相比之下,卷积神经网络具有更高的归纳偏差,因此更易于训练到高性能。近期,ConvNeXt思想旨在通过模仿Transformer块来优化标准卷积神经网络。在本研究中,我们改进了这一方法,设计了一种可扩展且针对医疗领域数据稀缺性定制的现代化卷积神经网络架构。我们引入了MedNeXt,一种灵感源自Transformer、基于大卷积核的分割网络,具有以下特点:1)完全的ConvNeXt 3D编码-解码网络,用于医学图像分割;2)残差ConvNeXt上下采样块,以保持跨尺度语义丰富度;3)通过向小卷积核网络上采样来迭代增加卷积核大小的新技术,以防止在有限的医学数据上性能饱和;4)在MedNeXt的多个级别(深度、宽度、卷积核大小)上进行复合缩放。这些特性使其在CT和MRI模态及不同数据集大小的4个任务上实现了最先进的表现,代表着医学图像分割领域的一种现代化深层架构。