Convolutional neural networks (CNNs) achieved the state-of-the-art performance in medical image segmentation due to their ability to extract highly complex feature representations. However, it is argued in recent studies that traditional CNNs lack the intelligence to capture long-term dependencies of different image regions. Following the success of applying Transformer models on natural language processing tasks, the medical image segmentation field has also witnessed growing interest in utilizing Transformers, due to their ability to capture long-range contextual information. However, unlike CNNs, Transformers lack the ability to learn local feature representations. Thus, to fully utilize the advantages of both CNNs and Transformers, we propose a hybrid encoder-decoder segmentation model (ConvTransSeg). It consists of a multi-layer CNN as the encoder for feature learning and the corresponding multi-level Transformer as the decoder for segmentation prediction. The encoder and decoder are interconnected in a multi-resolution manner. We compared our method with many other state-of-the-art hybrid CNN and Transformer segmentation models on binary and multiple class image segmentation tasks using several public medical image datasets, including skin lesion, polyp, cell and brain tissue. The experimental results show that our method achieves overall the best performance in terms of Dice coefficient and average symmetric surface distance measures with low model complexity and memory consumption. In contrast to most Transformer-based methods that we compared, our method does not require the use of pre-trained models to achieve similar or better performance. The code is freely available for research purposes on Github: (the link will be added upon acceptance).
翻译:在自然语言处理任务中成功应用变换模型后,医学图像分割场也看到对使用变异器的兴趣日益增长,原因是它们能够捕捉长距离背景信息。然而,与CNN不同,变异器缺乏自由学习本地特征描述的能力。因此,为了充分利用CNN和变异器的优势,我们建议采用混合的编码-变异分离分解模型(ConvTransSeg)模式(Conv Transseg),它包括多层CNN作为特征学习的编码器和相应的多层变异器作为分解预测的分解器。由于它们能够捕捉到长距离背景信息,因此对使用变异器也越来越感兴趣。我们的方法与其他许多现有的低级变异式组合型CNN和变异分解模型相比,为了充分利用CNNN和变异器的优势,我们提议了一个混合的编码-解码分解器分解模型(Conventtradeder-deder delation) 模式,而不是多级图像分解模型(Contravely) 。它包括若干个图像分解方法,用来在公共图像平均图像分解方法上实现最佳的模型。