Existing video polyp segmentation (VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs can not fully exploit the global temporal and spatial information in successive video frames, resulting in false-positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (~140fps) on a single RTX 2080 GPU and no post-processing. Our PNS-Net is based solely on a basic normalized self-attention block, equipping with recurrence and CNNs entirely. Experiments on challenging VPS datasets demonstrate that the proposed PNS-Net achieves state-of-the-art performance. We also conduct extensive experiments to study the effectiveness of the channel split, soft-attention, and progressive learning strategy. We find that our PNS-Net works well under different settings, making it a promising solution to the VPS task.
翻译:现有的视频聚合分离模型通常采用进化神经网络(VPS)来提取功能。然而,由于这些模型的可接收域有限,CNN无法在连续的视频框中充分利用全球时间和空间信息,从而产生错误的正分化结果。在本文中,我们提议了新型的PNS-Net(PNS-Net)(逐步实现标准化的自我注意网络),它能够有效地从一个单一的RTX 2080 GPU上实时(~140fps)的模拟视频中学习演示,而没有后处理。我们的PNS-Net仅仅基于一个基本标准化的自我注意区,完全配备了重复和CNN。关于挑战性能网络数据集的实验表明,拟议的PNS-Net实现了最新业绩。我们还进行了广泛的实验,以研究频道分裂、软感应和渐进学习战略的效果。我们发现我们的PNS-Net在不同环境中运作良好,因此成为VPS任务的有希望的解决办法。