Identifying polyps is challenging for automatic analysis of endoscopic images in computer-aided clinical support systems. Models based on convolutional networks (CNN), transformers, and their combinations have been proposed to segment polyps with promising results. However, those approaches have limitations either in modeling the local appearance of the polyps only or lack of multi-level features for spatial dependency in the decoding process. This paper proposes a novel network, namely ColonFormer, to address these limitations. ColonFormer is an encoder-decoder architecture capable of modeling long-range semantic information at both encoder and decoder branches. The encoder is a lightweight architecture based on transformers for modeling global semantic relations at multi scales. The decoder is a hierarchical network structure designed for learning multi-level features to enrich feature representation. Besides, a refinement module is added with a new skip connection technique to refine the boundary of polyp objects in the global map for accurate segmentation. Extensive experiments have been conducted on five popular benchmark datasets for polyp segmentation, including Kvasir, CVC-Clinic DB, CVC-ColonDB, CVC-T, and ETIS-Larib. Experimental results show that our ColonFormer outperforms other state-of-the-art methods on all benchmark datasets.
翻译:用于自动分析计算机辅助临床支持系统中的内窥镜图像的识别聚点对于自动分析计算机辅助临床支持系统中的内窥镜图像来说具有挑战性。基于变异网络(CNN)、变压器及其组合的模型被提议用于分解聚点,并产生有希望的结果。然而,这些方法在只建模聚点的本地外观方面有局限性,或者在解码过程中缺乏多层次的空间依赖性特征。本文件提议建立一个新颖的网络,即Colon Former,以解决这些局限性。Colon Former是一个编码器-解码器结构,能够建模在编码器和解码器分支的远程语义信息。编码器是一个基于变异器的轻量结构,用于建模多尺度的全球语义关系模型。除码器是一种等级网络结构结构,旨在学习多层次特征来丰富地貌代表。此外,本文件还添加了一个精细的连接技术,以完善全球地图中聚点物体的边界,以便进行精确分解。在聚合点和解码器分支分支中,包括Kvasirir、C-C-Ial-C显示我们C的C-C-C的C-C-C结果。