Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies, compared to the local convolutional-based design. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters and compute cost. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient having linear complexity with respect to the input sequence length. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the overall network parameters. Our extensive evaluations on three benchmarks, Synapse, BTCV and ACDC, reveal the effectiveness of the proposed contributions in terms of both efficiency and accuracy. On Synapse dataset, our UNETR++ sets a new state-of-the-art with a Dice Similarity Score of 87.2%, while being significantly efficient with a reduction of over 71% in terms of both parameters and FLOPs, compared to the best existing method in the literature. Code: https://github.com/Amshaker/unetr_plus_plus.
翻译:由于变压器模型的成功,最近的工作研究其在3D医疗分解任务中的适用性。在变压器模型中,自我注意机制是努力捕捉长距离依赖性的主要构件之一,而与本地的革命型设计相比,它是一个努力捕捉长距离依赖性的主要构件之一。然而,自我注意操作具有四面形的复杂性,这证明是一个计算瓶颈,特别是在体积医学成像中,输入为3D的体积成像中,它有无数片段。在本文中,我们提议3D医学分解方法,名为UNETR++,它既能提供高质量的分解掩码,又能提高参数和计算成本的效率。我们的设计核心是引入一个全新的高效对齐关注(EPA)块,通过基于空间和频道注意的双对独立分支来高效地学习空间和频道歧视性特征。我们的空间注意配方在输入序列中具有线性复杂性。为了在空间和以频道为主的分支之间进行交流,我们分享查询和关键绘图功能的权重,在参数上提供一种高质量的分辨和关键分辨的分辨数据分比,同时,同时在SyREDD的精度参数上,同时在显示的基值中,同时降低数据值值值值中,同时降低现有数据值值值值值值值值值值中,同时降低对标值。同时降低现有数据值值中,同时降低现有数据值的计算。