This paper presents a Transformer architecture for volumetric medical image segmentation. Designing a computationally efficient Transformer architecture for volumetric segmentation is a challenging task. It requires keeping a complex balance in encoding local and global spatial cues, and preserving information along all axes of the volumetric data. The proposed volumetric Transformer has a U-shaped encoder-decoder design that processes the input voxels in their entirety. Our encoder has two consecutive self-attention layers to simultaneously encode local and global cues, and our decoder has novel parallel shifted window based self and cross attention blocks to capture fine details for boundary refinement by subsuming Fourier position encoding. Our proposed design choices result in a computationally efficient architecture, which demonstrates promising results on Brain Tumor Segmentation (BraTS) 2021, and Medical Segmentation Decathlon (Pancreas and Liver) datasets for tumor segmentation. We further show that the representations learned by our model transfer better across-datasets and are robust against data corruptions. \href{https://github.com/himashi92/VT-UNet}{Our code implementation is publicly available}.
翻译:本文展示了一个用于体积医学图像分割的变异器结构。 设计一个计算高效的体积分解变异器结构是一项艰巨的任务。 它需要保持本地和全球空间提示编码的复杂平衡, 并在体积数据的所有轴上保存信息。 拟议的体积变异器有一个U型的编码器- 解码器设计, 用于处理输入的 voxels 的完整。 我们的编码器有两层连续的自我关注层, 以同时编码本地和全球的导线, 我们的解码器有新的平行的平行转换窗口以自我和交叉关注为主的转换块, 以捕捉细细细细节, 以便通过分录 Fourier 位置编码来改进边界 。 我们拟议的设计选择产生了一个计算高效的结构, 显示大脑图解析( BRATS) 2021 和医疗剖面解 Decathl( Pancreas and Leler) 数据分割数据集。 我们进一步显示, 我们的模型转换所学到的表达方式更精确, 并具有强大的反数据腐蚀能力。\fs pubb. com/ husion- des- commet