TransAttUnet:多级注意制导U-Net和医疗图像分割变异器 (TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation)

With the development of deep encoder-decoder architectures and large-scale annotated medical datasets, great progress has been achieved in the development of automatic medical image segmentation. Due to the stacking of convolution layers and the consecutive sampling operations, existing standard models inevitably encounter the information recession problem of feature representations, which fails to fully model the global contextual feature dependencies. To overcome the above challenges, this paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet, in which the multi-level guided attention and multi-scale skip connection are jointly designed to effectively enhance the functionality and flexibility of traditional U-shaped architecture. Inspired by Transformer, a novel self-aware attention (SAA) module with both Transformer Self Attention (TSA) and Global Spatial Attention (GSA) is incorporated into TransAttUnet to effectively learn the non-local interactions between encoder features. In particular, we also establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features. In this way, the representation ability of multi-scale context information is strengthened to generate discriminative features. Benefitting from these complementary components, the proposed TransAttUnet can effectively alleviate the loss of fine details caused by the information recession problem, improving the diagnostic sensitivity and segmentation quality of medical image analysis. Extensive experiments on multiple medical image segmentation datasets of different imaging demonstrate that our method consistently outperforms the state-of-the-art baselines.

翻译：随着深编码解码器结构的开发以及大规模附加说明的医疗数据集的开发,在自动医学成像结构的开发方面取得了巨大进展。由于卷叠层层的堆叠和连续的取样作业,现有标准模型不可避免地遇到地貌表现的信息衰退问题,这些表现未能充分模拟全球背景特征依赖性。为了克服上述挑战,本文件提议建立一个新型的基于变异器的医疗图像语义分割框架,称为TransAttUnet,在这个框架中,多级引导的注意力和多级跳跃连接被联合设计,以有效增强传统U型结构的功能和灵活性。由于变异器的堆叠和连续的取样作业,现有标准模型不可避免地会遇到地貌表现不全的问题,无法充分模拟全球背景特征之间的非局部互动。特别是,我们还建立了更多的多级解码方位连接,将不同的语义感知性和多级跳跃升连接起来,从而有效地强化了多级诊断性结构结构结构结构的演化,从缓解性分析中呈现出多级结构结构结构结构结构的演化,从而有效地展示了多级结构的演化结构图理学背景分析。