Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to overcome their inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer, a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, nnFormer produces significantly lower HD95 and comparable DSC results. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling.
翻译:自然语言处理的选择模式变异器,即自然语言处理的模式,很少引起医学成像界的注意。考虑到利用长期依赖性的能力,变异器有希望帮助非典型的进化神经网络克服空间感偏向的内在缺陷。然而,最近提出的大多数基于变压器的分解方法只是将变压器作为辅助模块处理,以帮助将全球背景编码成变动图解。为了解决这个问题,我们引入了NnFormer,即用于体积医学图象分解的3D变异器。NNFermer不仅利用了脱叶变异和自控操作的结合,而且还引入了基于本地和全球量的自控机制,以了解体积表示。此外,NNFormer提议不注意取代传统的变异体/加热操作,将U-Net类似结构的连接转换成变异体表示。实验显示,NnFormer在三个公共数据集上大大超越了以前的变异体的对等。与nnUNet相比,NFermer生成了大大较低的HD95和可比较的DSC结果。此外,我们展示了每个模型至高模的DMS 。