Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.
翻译:革命神经网络(CNNs) 模型长距离依赖性(CNNs) 模型, 其方式是用小窗口大小的堆叠式组合操作, 使优化变得困难。 本文将基于区域的非本地( RNL) 操作作为自控机制的组合, 它可以直接捕获长距离依赖性, 而不用使用一堆深层的本地操作。 根据中间特征图, 我们的方法通过汇集来自周边区域的所有位置的信息, 重新校正该特征。 通过将频道关注模块与拟议的RNL 组合起来, 我们设计了一个关注链, 可以整合到现成的CNNs, 用于终端到终端的培训。 我们用两种视频分类基准来评估我们的方法。 我们的方法的实验结果超越了其他关注机制, 我们在某样的V1数据集上取得了最先进的表现。