The CNN-based methods have achieved impressive results in medical image segmentation, but it failed to capture the long-range dependencies due to the inherent locality of convolution operation. Transformer-based methods are popular in vision tasks recently because of its capacity of long-range dependencies and get a promising performance. However, it lacks in modeling local context, although some works attempted to embed convolutional layer to overcome this problem and achieved some improvement, but it makes the feature inconsistent and fails to leverage the natural multi-scale features of hierarchical transformer, which limit the performance of models. In this paper, taking medical image segmentation as an example, we present MISSFormer, an effective and powerful Medical Image Segmentation tranSFormer. MISSFormer is a hierarchical encoder-decoder network and has two appealing designs: 1) A feed forward network is redesigned with the proposed Enhanced Transformer Block, which makes features aligned adaptively and enhances the long-range dependencies and local context. 2) We proposed Enhanced Transformer Context Bridge, a context bridge with the enhanced transformer block to model the long-range dependencies and local context of multi-scale features generated by our hierarchical transformer encoder. Driven by these two designs, the MISSFormer shows strong capacity to capture more valuable dependencies and context in medical image segmentation. The experiments on multi-organ and cardiac segmentation tasks demonstrate the superiority, effectiveness and robustness of our MISSFormer, the exprimental results of MISSFormer trained from scratch even outperforms state-of-the-art methods pretrained on ImageNet, and the core designs can be generalized to other visual segmentation tasks. The code will be released in Github.
翻译:以CNN为基础的方法在医疗图像分割方面取得了令人印象深刻的结果,但是它未能捕捉到长期依赖性,这是因为具有内在的变异操作地点。以变异器为基础的方法最近因其远距离依赖性的能力而在视觉任务中很受欢迎,并且取得了有希望的性能。然而,在模拟当地环境方面却缺乏。尽管有些工作试图嵌入变异层以克服这一问题并取得一些改进,但它使得其特征不一致,未能利用等级变异器的自然多尺度特征,从而限制模型的性能。在本文中,以医疗图像分割为例,我们展示了MSISFormer,一个有效和强大的医学图像变异性分析工具。 MISSFormer是一个等级级的变异形变异功能网络,具有两种具有吸引力的设计:(1) 饲料前网络与拟议增强变异变层的变异形结构进行重新设计,使特征适应性变异性特点和增强长期依赖性和当地环境环境环境。 我们提出了增强变异变异结构的背景桥梁,一个与更强的变异变异结构环境连接到模型,在更远的内更远的变变变变型的内变型结构结构中,多级变变变变变变变变变变型结构结构的变变变变变变的变形结构的机变后,多级结构将显示的机变式的变形结构的变形结构的变变后变变变变变变后演化、多级模型将演化、多级结构的变式的变式的变后演化、多级、变式的变式的变形性变型性变式的变型性变式、多级性变型性变型、变式、变型性变型性变型性变式、多级、变式、变式、变式、变式的变式变式变式的变式变式变型、变式变型、变型、变型、变式变型、变型、变式变式变式变式变式变式变式变式变型、变式变式变式变型、变式的变式、变式、变式的变式变型、变式、变式、变式变式变式变式变型、变型、变式变型、变型、变式