While CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness, they suffer from limitations in capturing long-range dependencies. Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation. To further extract rich representations, some extensions of the U-Net employ multi-scale feature extraction and fusion modules and obtain improved performance. Inspired by this idea, we propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder and adopting a contextual bridge for better feature fusion. The design proposed in this work is based on three core principles: (1) The patch merging module in the encoder is redesigned with ResInception Patch Merging (RIPM). Multi-branch transformer (MB transformer) adopts the same number of branches as the outputs of RIPM. Combining the two modules enables the model to capture a multi-scale representation within a single stage. (2) We construct an Intra-stage Feature Fusion (IFF) module following the MB transformer to enhance the aggregation of feature maps from all the branches and particularly focus on the interaction between the different channels of all the scales. (3) In contrast to a bridge that only contains token-wise self-attention, we propose a Dual Transformer Bridge that also includes channel-wise self-attention to exploit correlations between scales at different stages from a dual perspective. Extensive experiments on multi-organ and skin lesion segmentation tasks present the superior performance of TransCeption compared to previous work. The code is publicly available at \url{https://github.com/mindflow-institue/TransCeption}.
翻译:以CNN为基础的方法因其有希望的业绩和稳健性而成为医学图像分割的基石,但它们在捕捉远程依赖性方面存在局限性。 以变异器为基础的方法目前普遍存在, 因为它们扩大了接收字段, 以模拟全球背景关联。 为了进一步提取丰富的演示, U-Net的一些扩展采用多种规模特征提取和聚合模块, 并获得更好的性能。 受这个想法的启发, 我们提议为医疗图像分割采用 TransCeption( Transception), 一个纯粹的变异器Ushape网络, 将类似初始的模版纳入编码, 并采用一个背景的桥梁连接, 以更好的功能融合。 这项工作中的拟议设计基于三个核心原则:(1) 以 ReInception 校正合并(RIPM) 重新设计编码组合模块中的补配对模块。 多功能变异器(MB) 采用与 RIPM 输出的相同数量的分支。 将两个模块结合起来, 使模型能够在一个单一的阶段里程中捕获多级的变异功能表达。 (IFF) 模块, 在MBMTeral变异的变异的变异性变变式结构中建立一个模块中, 相对于的变异的变异的自我分析系统, 将整个的自我变异级系统, 将所有的变压级的变压级的变异级的系统, 将一个比级的比级的比级的系统, 将所有显示的变级的变式系统, 将整个的比级图图级图级图图图制的比级图级图级图式图级图式的比级图式图式图。