Spatial attention mechanism has been widely incorporated into deep convolutional neural networks (CNNs) via long-range dependency capturing, significantly lifting the performance in computer vision, but it may perform poorly in medical imaging. Unfortunately, existing efforts are often unaware that long-range dependency capturing has limitations in highlighting subtle lesion regions, neglecting to exploit the potential of multi-scale pixel context information to improve the representational capability of CNNs. In this paper, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Recalibration (PPCR) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner adaptively. PPCR first designs a cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. PPCR can be flexibly plugged into modern CNNs with negligible overhead. Extensive experiments on five medical image datasets and CIFAR benchmarks empirically demonstrate the superiority and generalization of PPCR over state-of-the-art attention methods. The in-depth analyses explain the inherent behavior of PPCR in the decision-making process, improving the interpretability of CNNs.
翻译:通过远距离依赖性捕获,空间关注机制被广泛纳入深卷神经网络(CNNs),通过远程依赖性捕捉,大大提升了计算机视觉的性能,但在医学成像方面可能表现不佳。不幸的是,目前的努力往往没有意识到长期依赖性捕捉在突出微妙的损害区域方面有局限性,忽视了利用多尺度像素背景信息的潜力,以提高CNN的代表性能力。在本文中,我们提议建立一个实用而轻量的建筑单元,即Pyramid Pixel环境校正(PPPCR)模块,该模块利用多尺度像素背景信息,以适应性地以非像素独立的方式对像素位置进行校正。 PPCR首先设计一个跨通道金字塔集合,汇集综合的多尺度像素背景信息,然后消除它们之间在精心设计的像素正常化中存在的不一致之处,最后通过像素背景整合来估计每个像素关注度的重量。PPCR可以灵活地连接到现代CNNS模块中,可忽略不小的间接费用。在五个医学图像数据集和CIFAR行为的内在行为分析中进行广泛的实验性分析。CR的内在行为分析,从而解释了SBPPPM的优越性分析。</s>