Deep neural networks have been a prevailing technique in the field of medical image processing. However, the most popular convolutional neural networks (CNNs) based methods for medical image segmentation are imperfect because they model long-range dependencies by stacking layers or enlarging filters. Transformers and the self-attention mechanism are recently proposed to effectively learn long-range dependencies by modeling all pairs of word-to-word attention regardless of their positions. The idea has also been extended to the computer vision field by creating and treating image patches as embeddings. Considering the computation complexity for whole image self-attention, current transformer-based models settle for a rigid partitioning scheme that potentially loses informative relations. Besides, current medical transformers model global context on full resolution images, leading to unnecessary computation costs. To address these issues, we developed a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans). The PMTrans captured multi-range relations by working on multi-resolution images. An adaptive partitioning scheme was implemented to retain informative relations and to access different receptive fields efficiently. Experimental results on three medical image datasets (gland segmentation, MoNuSeg, and HECKTOR datasets) showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
翻译:深心神经网络是医学图像处理领域的一项普遍技术,然而,最受欢迎的以医学图像分割为基础的进化神经网络(CNNs)方法并不完善,因为它们通过堆叠层或扩大过滤器来模拟远距离依赖性。最近,提议变压器和自留机制通过建模所有一对字对字的注意来有效学习远距离依赖性,而不论其位置如何。这个想法也通过创建和将图像补丁作为嵌入处理而扩展到计算机视觉领域。考虑到整个图像自留的计算复杂性,目前以变压器为基础的模型将固定为僵硬的分割方案,从而可能失去信息关系。此外,目前的医学变压器模型和自留机机制为全分辨率图像的全球背景模型,导致不必要的计算费用。为了解决这些问题,我们开发了一种新颖的方法,利用金字塔网络结构,即基于金字塔的医疗变压模型(PMTRATIER),通过多分辨率图像工作捕捉到多种类型关系。一个适应性分隔器变压式的模型,用于保留信息化地分隔关系,并获取不同的图像。