Medical image segmentation is a fundamental task in the community of medical image analysis. In this paper, a novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), is proposed. CTO employs a combination of Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and an explicit boundary detection operator to achieve high recognition accuracy while maintaining an optimal balance between accuracy and efficiency. The proposed CTO follows the standard encoder-decoder segmentation paradigm, where the encoder network incorporates a popular CNN backbone for capturing local semantic information, and a lightweight ViT assistant for integrating long-range dependencies. To enhance the learning capacity on boundary, a boundary-guided decoder network is proposed that uses a boundary mask obtained from a dedicated boundary detection operator as explicit supervision to guide the decoding learning process. The performance of the proposed method is evaluated on six challenging medical image segmentation datasets, demonstrating that CTO achieves state-of-the-art accuracy with a competitive model complexity.
翻译:医学图像分割是医学图像分析领域中的一项基础任务。本文提出了一种新的网络架构,称为卷积、变换器和操作符(CTO)。CTO采用卷积神经网络(CNN)、视觉变换器(ViT)和显式边界检测操作符的组合,以在保持高识别精度的同时保持精度和效率的最佳平衡。所提出的CTO遵循标准的编码器-解码器分割范式,其中编码器网络结合了一种流行的CNN骨干网络以捕获局部语义信息,以及一个轻量级的ViT辅助器,以整合长距离依赖性。为了增强边界学习能力,提出了一个边界引导解码器网络,它使用从专门的边界检测运算符获得的边界掩码作为显式监督来指导解码学习过程。将所提出的方法在六个具有挑战性的医学图像分割数据集上进行了评估,结果表明CTO在具有竞争力的模型复杂度的同时取得了最先进的精度。