有效空间时情行动承认示范行动承认 (Efficient Spatialtemporal Context Modeling for Action Recognition) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · 可约的 · 分解的 · 注意力机制 ·

2021 年 4 月 6 日

Efficient Spatialtemporal Context Modeling for Action Recognition

翻译：有效空间时情行动承认示范行动承认

Congqi Cao,Yue Lu,Yifan Zhang,Dongmei Jiang,Yanning Zhang

from arxiv, 16 pages, 7 figures

Contextual information plays an important role in action recognition. Local operations have difficulty to model the relation between two elements with a long-distance interval. However, directly modeling the contextual information between any two points brings huge cost in computation and memory, especially for action recognition, where there is an additional temporal dimension. Inspired from 2D criss-cross attention used in segmentation task, we propose a recurrent 3D criss-cross attention (RCCA-3D) module to model the dense long-range spatiotemporal contextual information in video for action recognition. The global context is factorized into sparse relation maps. We model the relationship between points in the same line along the direction of horizon, vertical and depth at each time, which forms a 3D criss-cross structure, and duplicate the same operation with recurrent mechanism to transmit the relation between points in a line to a plane finally to the whole spatiotemporal space. Compared with the non-local method, the proposed RCCA-3D module reduces the number of parameters and FLOPs by 25% and 30% for video context modeling. We evaluate the performance of RCCA-3D with two latest action recognition networks on three datasets and make a thorough analysis of the architecture, obtaining the optimal way to factorize and fuse the relation maps. Comparisons with other state-of-the-art methods demonstrate the effectiveness and efficiency of our model.

翻译：在行动识别中,背景信息具有重要作用。本地操作很难在长距离间隔的两个元素之间建模关系。但是, 直接建模两个点之间的背景信息在计算和记忆方面带来巨大的成本, 特别是对于行动识别而言, 在具有额外时间层面的情况下, 直接建模两个点之间的背景信息将带来巨大的计算和记忆成本, 在有额外时间层面的情况下, 受 2D 切片任务中使用的交叉关注的启发, 我们建议一个三维切片交叉关注模块( RCCA-3D ), 以模拟用于行动识别的视频中密集长距离波段背景信息。将全球背景纳入稀薄关系图中。我们按照地平线、垂直和深度方向, 以及每个时间点的深度和深度, 来模拟同一线上各点之间的关系, 形成一个3D CRCA-3 交叉结构, 重复同一操作, 将线上各点与整个波场空间的连接机制。与非本地方法相比, 拟议的RCCA-3D 模块将参数和FLOP数量减少25%和30%, 用于视频背景建模。我们评估RACCA-3D的绩效和FI- 格式结构与两个最优化的模型关系,,, 以获取了我们最精确化的系统化的系统化的模型-, 和最精确化的模型和最精确化的模型 3号,, 和最精确化的模型。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【北京大学】动态异构图神经网络建模情感，Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

【北京大学】动态异构图神经网络建模情感，Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

专知会员服务

55+阅读 · 2020年4月15日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

泡泡机器人SLAM

3+阅读 · 2019年3月12日

已删除

将门创投

3+阅读 · 2019年1月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Dual-stream Network for Visual Recognition

Arxiv

0+阅读 · 2021年5月31日

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Arxiv

3+阅读 · 2021年3月4日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Global2Local: Efficient Structure Search for Video Action Segmentation

Arxiv

5+阅读 · 2021年1月4日

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition

Arxiv

4+阅读 · 2020年12月18日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【北京大学】动态异构图神经网络建模情感，Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

【北京大学】动态异构图神经网络建模情感，Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

专知会员服务

55+阅读 · 2020年4月15日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

【泡泡一分钟】OFF:快速鲁棒视频动作识别的运动表征

泡泡机器人SLAM

3+阅读 · 2019年3月12日

已删除

将门创投

3+阅读 · 2019年1月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Dual-stream Network for Visual Recognition

Arxiv

0+阅读 · 2021年5月31日

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Arxiv

3+阅读 · 2021年3月4日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Global2Local: Efficient Structure Search for Video Action Segmentation

Arxiv

5+阅读 · 2021年1月4日

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition

Arxiv

4+阅读 · 2020年12月18日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员