利用3D有线电视新闻网的全球-当地关注,使地方化和承认行动 (Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN)

3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResNext network, and ii) implement attention gating network to improve the accuracy of the action recognition. The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition. Firstly, the 3DResNext is trained and applied for action classification using backpropagation concerning the maximum predicted class. The gradients and activations of every layer are then up-sampled. Later, aggregation is used to produce more nuanced attention, which points out the most critical part of the predicted class's input videos. We use contour thresholding of final attention for final localization. We evaluate spatial and temporal action localization in trimmed videos using fine-grained visual explanation via 3DCam. Experimental results show that the proposed approach produces informative visual explanations and discriminative attention. Furthermore, the action recognition via attention gating on each layer produces better classification results than the baseline model.

翻译：3D Convolution Neal Network (3D CNN) 捕捉了3D数据(如视频序列)的空间和时间信息,然而,由于变化和集合机制,信息损失似乎不可避免。为了改进3D CNN的视觉解释和分类,我们建议了两种方法:i) 利用训练有素的 3DResNext 网络,从全球到地方(全球-地方)离散梯度的集合层到当地(全球-地方)的离散梯度,以及ii) 实施关注网,以提高行动识别的准确性。拟议方法的目的是通过视觉归属、不受监督的行动本地化和行动识别,显示3DCNN中被称为全球-地方关注的每一层的有用性。首先,3DResNext经过培训,运用关于最大预测阶级的背面调整,应用对行动分类进行应用,对行动分类加以应用。随后,对每个层的梯度和启动进行上层进行更细微的汇总,用来引起人们的注意,从而将最终注意力归为最终定位。我们用最后定位的定点,我们评估了空间和时间行动的图像分类结果,通过图像分析式通过图像解析法解释。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【CVPR 2020-商汤】8比特数值也能训练卷积神经网络模型

专知会员服务

26+阅读 · 2020年5月7日

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

专知会员服务

44+阅读 · 2020年4月17日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日