Identifying polyps is a challenging problem for automatic analysis of endoscopic images in computer-aided clinical support systems. Models based on convolutional networks (CNN), transformers, and combinations of them have been proposed to segment polyps with promising results. However, those approaches have limitations either in modeling the local appearance of the polyps only or lack of multi-level features for spatial dependency in the decoding process. This paper proposes a novel network, namely ColonFormer, to address these limitations. ColonFormer is an encoder-decoder architecture with the capability of modeling long-range semantic information at both encoder and decoder branches. The encoder is a lightweight architecture based on transformers for modeling global semantic relations at multi scales. The decoder is a hierarchical network structure designed for learning multi-level features to enrich feature representation. Besides, a refinement module is added with a new skip connection technique to refine the boundary of polyp objects in the global map for accurate segmentation. Extensive experiments have been conducted on five popular benchmark datasets for polyp segmentation, including Kvasir, CVC-Clinic DB, CVCColonDB, EndoScene, and ETIS. Experimental results show that our ColonFormer achieve state-of-the-art performance on all benchmark datasets.
翻译:对计算机辅助临床支持系统中的内窥镜图像进行自动分析是一个具有挑战性的问题。基于卷变网络(CNN)、变压器及其组合的模型已被提议为具有有希望结果的分块聚变器。然而,这些方法在只建模聚谱的本地外观方面有局限性,或者在解码过程中缺乏多层次的空间依赖性特征。本文件提议建立一个新颖的网络,即Colon Former,以解决这些局限性。Colon Former是一个编码解码器结构,有能力在编码器和解码器两个分支中建模长距离的语义信息模型。编码器是一种基于变压器的轻量结构,用于建模多尺度的全球语义关系模型。解码器是一种等级网络结构,旨在学习多层次特征来丰富地貌代表。此外,还添加了一个改进模块,采用新的跳连接技术来改进全球地图中聚谱对象的界限,以便准确分割。在聚谱和解码分支分支分支中进行了五种通用基准数据设置,用于聚谱-C,包括我们Evairal-C的C级实验结果。