注意:高效视频分类的建模背景关联 (Attention in Attention: Modeling Context Correlation for Efficient Video Classification)

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to the utilization of perspective contexts. However, the current research on video attention generally focuses on adopting a specific aspect of contexts (e.g., channel, spatial/temporal, or global context) to refine the features and neglects their underlying correlation when computing attentions. This leads to incomplete context utilization and hence bears the weakness of limited performance improvement. To tackle the problem, this paper proposes an efficient attention-in-attention (AIA) method for element-wise feature refinement, which investigates the feasibility of inserting the channel context into the spatio-temporal attention learning module, referred to as CinST, and also its reverse variant, referred to as STinC. Specifically, we instantiate the video feature contexts as dynamics aggregated along a specific axis with global average and max pooling operations. The workflow of an AIA module is that the first attention block uses one kind of context information to guide the gating weights calculation of the second attention that targets at the other context. Moreover, all the computational operations in attention units act on the pooled dimension, which results in quite few computational cost increase ($<$0.02\%). To verify our method, we densely integrate it into two classical video network backbones and conduct extensive experiments on several standard video classification benchmarks. The source code of our AIA is available at \url{https://github.com/haoyanbin918/Attention-in-Attention}.

翻译：由于利用了视角背景,关注机制大大提高了视频神经网络分类的性能,然而,目前对视频关注的研究一般侧重于采用特定背景(如频道、空间/时空或全球背景)来完善特征,在计算注意力时忽视其内在关联性,导致背景利用不完全,因此也存在绩效改进有限的弱点。为解决这一问题,本文件建议对元素性能进行精细改进,采用一种高效的 " 关注 " 方法,以调查将频道内容插入时空关注学习模块的可行性,称为 " CinST ",以及其反向变式,称为 " STinC " 。具体地说,我们将视频特征背景作为动态,与全球平均和最大集中操作的具体轴一起汇总。AIA模块的工作流程是,第一个关注区使用一种背景信息来指导对目标在另一背景下的第二次关注量的加权计算。此外,所有关注单位的计算操作单位在集合值层面,称为 " CinST,还有称为 " CinST " 以及其反向变量,称为STinC。具体地,我们将视频特征背景环境环境环境环境环境环境环境环境环境环境环境环境环境环境环境环境环境环境背景环境环境环境环境环境环境进行汇总整合,将结合结合结合,将我们现有的两个计算成本计算。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日