The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elements in an image or video. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph. Specifically, when choosing the Chebyshev graph filter, a unified formulation can be derived for explaining and analyzing the existing nonlocal-based blocks (e.g., nonlocal block, nonlocal stage, double attention block). Furthermore, by concerning the property of spectral, we propose an efficient and robust spectral nonlocal block, which can be more robust and flexible to catch long-range dependencies when inserted into deep neural networks than the existing nonlocal blocks. Experimental results demonstrate the clear-cut improvements and practical applicabilities of our method on image classification, action recognition, semantic segmentation, and person re-identification tasks.
翻译:非本地区块的设计是为了捕捉计算机视觉任务中的远程空间-时空依赖性。虽然它们表现良好,但它们仍然缺乏在图像或视频中对元素中含有的丰富、结构化信息进行编码的机制。在本文件中,为了从理论上分析这些非本地区块的属性,我们提供了一个新的视角来解释这些区块,我们将它们视为完全连接的图表上生成的一组图表过滤器。具体地说,在选择Chebyshev图过滤器时,可以产生一种统一的配方,用于解释和分析现有的非本地区块(例如,非本地区块、非本地区块、双地段、双地段)。此外,关于光谱特性,我们提议了一个高效和稳健的光谱非本地区块,在与现有非本地区块相连接的深层神经网络连接时,可以更有力和灵活地捕捉长距离依赖性。实验结果表明,我们在图像分类、行动识别、语义分割和人再识别任务方面,我们的方法有明确的改进和实际相适应性。