Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure. Extensive experiments on various medical image segmentation datasets demonstrate the effectiveness of HiFormer over other CNN-based, transformer-based, and hybrid methods in terms of computational complexity, and quantitative and qualitative results. Our code is publicly available at: https://github.com/amirhossein-kz/HiFormer
翻译:遗传神经网络(CNNs)是医学图像分割任务的共识,但是,由于变异操作的性质,在模拟远程依赖性和空间相关性方面受到限制,尽管变压器最初是用来解决这个问题的,但未能捕捉到低层次的特征。相反,事实证明,本地和全球的特征对于密集预测都至关重要,例如在具有挑战性的环境下进行分解。在本文中,我们提议HiFormer,这是将CNN和变压器有效地连接到医疗图像分割的新型方法。具体地说,我们使用半成型 Swin变压器模块和基于CNN的编码器设计了两个多尺度的特征显示器。为确保从上述两个图案中获得的全球和本地特征的精细融合,我们提议在编码-decoder结构的跳过连接中采用双层融合模块。关于各种医学图像分割数据集的广泛实验表明HiFormer相对于其他CNN、变压器和混合方法在计算复杂度、定量和定性结果方面的效力。我们的代码可在https:Form/IMS/IMS/Qals。