The colorectal polyps classification is a critical clinical examination. To improve the classification accuracy, most computer-aided diagnosis algorithms recognize colorectal polyps by adopting Narrow-Band Imaging (NBI). However, the NBI usually suffers from missing utilization in real clinic scenarios since the acquisition of this specific image requires manual switching of the light mode when polyps have been detected by using White-Light (WL) images. To avoid the above situation, we propose a novel method to directly achieve accurate white-light colonoscopy image classification by conducting structured cross-modal representation consistency. In practice, a pair of multi-modal images, i.e. NBI and WL, are fed into a shared Transformer to extract hierarchical feature representations. Then a novel designed Spatial Attention Module (SAM) is adopted to calculate the similarities between the class token and patch tokens %from multi-levels for a specific modality image. By aligning the class tokens and spatial attention maps of paired NBI and WL images at different levels, the Transformer achieves the ability to keep both global and local representation consistency for the above two modalities. Extensive experimental results illustrate the proposed method outperforms the recent studies with a margin, realizing multi-modal prediction with a single Transformer while greatly improving the classification accuracy when only with WL images.
翻译:为了提高分类准确性,大多数计算机辅助诊断算法都通过采用收缩带成像(NBI)来识别有色切片。然而,NBI通常在实际临床假想中缺乏利用,因为获取这一特定图像需要人工转换光模式,因为使用白光(WL)图像已经检测到聚点。为了避免上述情况,我们建议了一种新的方法,通过结构化的跨模式代表性一致性,直接实现准确的白光结肠镜图像分类。在实践中,一对多模式图像(即NBI和WL)被装入一个共享的变异器,以获取等级特征显示。随后,新设计的空间关注模块(SAM)被采用,以计算类符号和补贴符号%之间的相似性,而该符号和补贴贴符号是多层次特定模式图像的。通过在不同级别对配对 NBI和WL图像的类象征和空间关注地图进行校准,变异器能够保持全球和本地的图像(即NBI和WL)组合图像,从而将一组多模式图像纳入一个共同变异的图像,同时将最新图像与两个模型的精确性分析,然后将微变换图像与一个模型进行,然后将微变换成一个模型的图像与一个模型,然后用一个微变式的图像与一个微变换式的图像进行。