Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused features still have small "effective receptive fields" with a focus on local image cues, limiting their performance. In this work, we propose Segtran, an alternative segmentation framework based on transformers, which have unlimited "effective receptive fields" even at high feature resolutions. The core of Segtran is a novel Squeeze-and-Expansion transformer: a squeezed attention block regularizes the self attention of transformers, and an expansion block learns diversified representations. Additionally, we propose a new positional encoding scheme for transformers, imposing a continuity inductive bias for images. Experiments were performed on 2D and 3D medical image segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Compared with representative existing methods, Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities.
翻译:医学图像分割对于计算机辅助诊断很重要。 良好的分解要求模型同时看到大图和细细细节, 即学习包含大背景的图像特征, 并保持高空间分辨率。 要实现这一目标, 最广泛使用的方法是 U- Net 和 变异、 提取和引信多尺度特性。 但是, 连接特性仍然有小的“ 有效可接收域 ”, 重点是本地图像提示, 限制其性能。 在这项工作中, 我们提议Segtran, 一个基于变压器的替代分解框架, 它具有无限的“ 有效可接收域 ” 。 Segtran 的核心是一个新的 Squeze- 和 Expanation 变异器: 一个紧凑的注意区块 调节变异器的自我注意力, 一个扩展区块学习多样化的表达方式。 此外, 我们提议为变压器建立一个新的定位编码计划, 给图像带来一种感性偏差的连续性。 在 2D 和 3D 医学图像分割任务上进行了实验: Fundus 图像的光盘/ 分解( REG20 TFIN 挑战 ),, 和 以 最具有 的 的 Crealbal- cregraducal 的 的 度 roduction 度 和 度 的 度 度 和 的 roduction 共 度 度 的 的 度 度 的 。