TAM:视频识别时间适应模块 (TAM: Temporal Adaptive Module for Video Recognition) - 专知论文

会员服务 ·

0

视频分类 · Weight · Extensibility · TANet · 核化 ·

2020 年 10 月 14 日

TAM: Temporal Adaptive Module for Video Recognition

翻译：TAM:视频识别时间适应模块

Zhaoyang Liu,Limin Wang,Wayne Wu,Chen Qian,Tong Lu

from arxiv, Code is available at https://github.com/liu-zhy/temporal-adaptive-module

Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module (TAM) to generate video-specific temporal kernels based on its own feature maps. TAM proposes a unique two-level adaptive modeling scheme by decoupling dynamic kernel into a location sensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a principled module and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 and Something-Something datasets demonstrate that the TAM outperforms other temporal modeling methods consistently, and achieves the state-of-the-art performance under the similar complexity.

翻译：视频数据由于摄影机动、速度变化和不同活动等各种因素而具有复杂的时间动态。为了有效捕捉这种不同的运动模式,本文件介绍了一个新的时间适应模块(TAM),以根据自己的地貌地图生成视频特定的时间内核。TAM提出一个独特的两级适应型模型,将动态内核分离成一个对位置敏感的重要性地图和一个位置不变的聚合权重。重要地图由当地时间窗口学习,以捕捉短期信息,而聚合权重则从全球视角产生,以长期结构为重点。TAM是一个原则模块,可以并入2D CNNs,以产生一个强大的视频结构(TANNet),其额外计算成本非常小。关于动因-400和某些事物的大规模实验显示,TAM与其他时间模型方法的兼容性一致,并在类似复杂的情况下实现最先进的性能。

0

相关内容

视频分类

近期必读的六篇 ECCV 2020【行人重识别（ReID）】相关论文

近期必读的六篇 ECCV 2020【行人重识别（ReID）】相关论文

专知会员服务

36+阅读 · 2020年8月4日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

3D目标检测进展综述

3D目标检测进展综述

专知会员服务

193+阅读 · 2020年4月24日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019| 04-08更新19篇论文及代码（1篇oral、目标检测、行人检测、视频超分辨等）

CVPR2019| 04-08更新19篇论文及代码（1篇oral、目标检测、行人检测、视频超分辨等）

极市平台

19+阅读 · 2019年4月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

专知

5+阅读 · 2018年1月21日

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

极市平台

3+阅读 · 2017年11月30日

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition

Arxiv

0+阅读 · 2020年11月26日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

Appearance-and-Relation Networks for Video Classification

Arxiv

3+阅读 · 2017年11月24日

VIP会员

文章信息

相关主题

相关VIP内容

近期必读的六篇 ECCV 2020【行人重识别（ReID）】相关论文

近期必读的六篇 ECCV 2020【行人重识别（ReID）】相关论文

专知会员服务

36+阅读 · 2020年8月4日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

3D目标检测进展综述

3D目标检测进展综述

专知会员服务

193+阅读 · 2020年4月24日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019| 04-08更新19篇论文及代码（1篇oral、目标检测、行人检测、视频超分辨等）

CVPR2019| 04-08更新19篇论文及代码（1篇oral、目标检测、行人检测、视频超分辨等）

极市平台

19+阅读 · 2019年4月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

专知

5+阅读 · 2018年1月21日

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

极市平台

3+阅读 · 2017年11月30日

相关论文

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition

Arxiv

0+阅读 · 2020年11月26日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

Appearance-and-Relation Networks for Video Classification

Arxiv

3+阅读 · 2017年11月24日

微信扫码咨询专知VIP会员