Deep neural networks have been a prevailing technique in the field of medical image processing. However, the most popular convolutional neural networks (CNNs) based methods for medical image segmentation are imperfect because they cannot adequately model long-range pixel relations. Transformers and the self-attention mechanism are recently proposed to effectively learn long-range dependencies by modeling all pairs of word-to-word attention regardless of their positions. The idea has also been extended to the computer vision field by creating and treating image patches as embeddings. Considering the computation complexity for whole image self-attention, current transformer-based models settle for a rigid partitioning scheme that would potentially lose informative relations. Besides, current medical transformers model global context on full resolution images, leading to unnecessary computation costs. To address these issues, we developed a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans). The PMTrans captured multi-range relations by working on multi-resolution images. An adaptive partitioning scheme was implemented to retain informative relations and to access different receptive fields efficiently. Experimental results on two medical image datasets, gland segmentation and MoNuSeg datasets, showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
翻译:深心神经网络是医学图像处理领域的一项普遍技术,然而,最受欢迎的以进化神经网络为基础的医学图像分割方法并不完善,因为它们不能充分模拟长距离像素关系。最近建议变异器和自我注意机制通过建模所有一对字对字的注意,而不论其位置如何,有效地学习长距离依赖性。这个想法也通过创建和将图像补丁作为嵌入软件而扩展到计算机视野领域。考虑到整个图像自留的计算复杂性,目前基于变异器的模型将固定一个僵硬的分割计划,从而可能失去信息关系。此外,目前医学变异器模型在完全分辨率图像上的全球背景模型,导致不必要的计算成本。为了解决这些问题,我们开发了一种新颖的方法,将多层次的关注和CNN特征提取结合起来,使用金字塔式网络结构,即Pyramid医疗变异器(PMTransyer),通过多分辨率图像工作捕获了多层关系。应用的调和调制分隔方案是为了保持信息关系,并获取不同接受式的MIS结构的最新图像。