The Transformer has been successfully used in medical image segmentation due to its excellent long-range modeling capabilities. However, patch segmentation is necessary when building a Transformer class model. This process may disrupt the tissue structure in medical images, resulting in the loss of relevant information. In this study, we proposed a Heterogeneous Swin Transformer with Multi-Receptive Field (HST-MRF) model based on U-shaped networks for medical image segmentation. The main purpose is to solve the problem of loss of structural information caused by patch segmentation using transformer by fusing patch information under different receptive fields. The heterogeneous Swin Transformer (HST) is the core module, which achieves the interaction of multi-receptive field patch information through heterogeneous attention and passes it to the next stage for progressive learning. We also designed a two-stage fusion module, multimodal bilinear pooling (MBP), to assist HST in further fusing multi-receptive field information and combining low-level and high-level semantic information for accurate localization of lesion regions. In addition, we developed adaptive patch embedding (APE) and soft channel attention (SCA) modules to retain more valuable information when acquiring patch embedding and filtering channel features, respectively, thereby improving model segmentation quality. We evaluated HST-MRF on multiple datasets for polyp and skin lesion segmentation tasks. Experimental results show that our proposed method outperforms state-of-the-art models and can achieve superior performance. Furthermore, we verified the effectiveness of each module and the benefits of multi-receptive field segmentation in reducing the loss of structural information through ablation experiments.
翻译:Transformer 由于其出色的远程建模能力,已经成功地应用于医学图像分割。然而,构建 Transformer 类模型时需要进行补丁分割。这个过程可能会破坏医学图像中的组织结构,导致相关信息的丢失。在本研究中,我们提出了一种基于 U 型网络的异构 Swin Transformer with Multi-Receptive Field (HST-MRF) 模型,用于医学图像分割。主要目的是通过在不同的感受野下融合补丁信息,解决补丁分割引起的结构信息丢失问题。异构 Swin Transformer (HST) 是核心模块,通过异构注意力实现多重感受野补丁信息的交互,并将其传递到下一个阶段进行渐进式学习。我们还设计了一个两阶段的融合模块,多模态双线性池化 (MBP),以辅助 HST 进一步融合多重感受野信息,并组合低级和高级语义信息来准确定位病变区域。此外,我们还开发了自适应补丁嵌入 (APE) 和软通道注意力 (SCA) 模块,在获取补丁嵌入和过滤通道特征时保留更多有价值的信息,从而提高模型分割质量。我们在多个数据集上评估了 HST-MRF 模型用于息肉和皮肤病变分割任务。实验结果表明,我们提出的方法优于现有的最先进模型,并且可以实现卓越的性能。此外,我们通过消融实验验证了每个模块的有效性和多重感受野分割的优势,以减少结构信息的损失。