Over the past decade, Deep Convolutional Neural Networks have been widely adopted for medical image segmentation and shown to achieve adequate performance. However, due to the inherent inductive biases present in the convolutional architectures, they lack understanding of long-range dependencies in the image. Recently proposed Transformer-based architectures that leverage self-attention mechanism encode long-range dependencies and learn representations that are highly expressive. This motivates us to explore Transformer-based solutions and study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. Majority of existing Transformer-based network architectures proposed for vision applications require large-scale datasets to train properly. However, compared to the datasets for vision applications, for medical imaging the number of data samples is relatively low, making it difficult to efficiently train transformers for medical applications. To this end, we propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. Furthermore, to train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance. Specifically, we operate on the whole image and patches to learn global and local features, respectively. The proposed Medical Transformer (MedT) is evaluated on three different medical image segmentation datasets and it is shown that it achieves better performance than the convolutional and other related transformer-based architectures. Code: https://github.com/jeya-maria-jose/Medical-Transformer
翻译:过去十年来,深革命神经网络被广泛采用,用于医疗图像分割,并显示其达到适当的性能;然而,由于在革命结构中存在固有的感性偏差,这些网络缺乏对图像中长期依赖性的理解;最近提出的以变压器为基础的结构,这些结构利用了自我注意机制,将长期依赖性特征编码,并学习了高度直观的表现;这促使我们探索基于变压器的网络结构,并研究使用基于变压器的网络结构进行医疗图像分割任务的可行性;为视觉应用提议的基于变压器的现有网络结构的多数需要大规模数据集来进行适当的培训;然而,与用于视觉应用的数据集相比,数据样本的数量相对较低,因此难以有效地培训用于医疗应用的变压器;为此,我们建议采用基于变压器的Axial-Avia 模式,在自控模块中引入额外的控制机制,从而扩大现有结构的可行性;此外,为了对模型进行有效的医学图像培训,我们提议采用大型的变压器-变压式模型,我们建议采用不同的全球变压法性业绩战略。