Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., Not-aNother transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC.
翻译:自然语言处理过程中的默认选择模式变异器,即自然语言处理中的默认选择模式,很少引起医学成像界的注意。鉴于利用长期依赖性的能力,变异器有希望帮助非典型的卷变神经网络(convnets)克服空间感化偏向的固有缺陷。然而,大多数最近提出的基于变压器的分解方法只是将变异器作为辅助模块处理,以帮助将全球背景编码成共变表达式,而没有研究如何最佳地将自我注意(即变异器的核心)与共变异结合起来。为了解决这一问题,我们在本文中引入了耐变变模型(即非其他变异神经网络(convontal Formers)),这是一个强大的分解模型,以自我注意和变异的实验组合为基础,在SynC-Sycreapeal-Sycreportnal Forps 上分别通过SynC-Sycreactions 和Sycreportal-SyC-SyCForstational-SyalFor Syal Fections 分别进行更好的分析。