TAda! 用于视频理解的暂时性适应性革命 (TAda! Temporally-Adaptive Convolutions for Video Understanding) - 专知论文

会员服务 ·

0

卷积 · Weight · MoDELS · 可理解性 · 核化 ·

2021 年 10 月 12 日

TAda! Temporally-Adaptive Convolutions for Video Understanding

翻译：TAda! 用于视频理解的暂时性适应性革命

Ziyuan Huang,Shiwei Zhang,Liang Pan,Zhiwu Qing,Mingqian Tang,Ziwei Liu,Marcelo H. Ang Jr

Spatial convolutions are widely used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, i.e., using shared weights for every location in different frames. This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos. Specifically, TAdaConv empowers the spatial convolutions with temporal modelling abilities by calibrating the convolution weights for each frame according to its local and global temporal context. Compared to previous temporal modelling operations, TAdaConv is more efficient as it operates over the convolution kernels instead of the features, whose dimension is an order of magnitude smaller than the spatial resolutions. Further, the kernel calibration also brings an increased model capacity. We construct TAda2D networks by replacing the spatial convolutions in ResNet with TAdaConv, which leads to on par or better performance compared to state-of-the-art approaches on multiple video action recognition and localization benchmarks. We also demonstrate that as a readily plug-in operation with negligible computation overhead, TAdaConv can effectively improve many existing video models with a convincing margin. Codes and models will be made available at https://github.com/alibaba-mmai-research/pytorch-video-understanding.

翻译：在许多深层视频模型中广泛使用空间空间共变。它基本上假定时空差异, 即使用不同框架中每个位置的共享权重。这项工作展示了用于视频理解的“ 时间- 适应性进化” (Tada Conv) 视频理解, 表明时间维度的适应性权重校准是便利模拟视频中复杂时间动态的有效方法。具体地说, TAda Conv 通过根据每个框架的本地和全球时间背景校准其时间建模能力, 赋予空间共变能力以时间性变异能力。与以往的时间建模操作相比, TAda Conv 效率更高, 因为它在变动内核内核运行, 而不是其尺寸小于空间分辨率。此外, 内核校校准还带来一个更大的模型能力。我们建造TAda2D网络, 将ResNet的空间共变换成TAda Convonvon, 与多个视频动作识别和本地化基准的状态方法相比, 将提高性或更好的性性表现。我们还展示了可轻易得到的图像/ 模型。

0

相关内容

在数学（特别是功能分析）中，卷积是对两个函数（f和g）的数学运算，产生三个函数，表示第一个函数的形状如何被另一个函数修改。卷积一词既指结果函数，又指计算结果的过程。它定义为两个函数的乘积在一个函数反转和移位后的积分。并针对所有shift值评估积分，从而生成卷积函数。

【2021新书】高阶网络，150页pdf，Higher-Order Networks

【2021新书】高阶网络，150页pdf，Higher-Order Networks

专知会员服务

89+阅读 · 2021年11月26日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling

专知会员服务

63+阅读 · 2020年5月23日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

专知会员服务

58+阅读 · 2020年5月21日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Arxiv

7+阅读 · 2021年9月23日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Global2Local: Efficient Structure Search for Video Action Segmentation

Arxiv

5+阅读 · 2021年1月4日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

4+阅读 · 2019年10月28日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

ECO: Efficient Convolutional Network for Online Video Understanding

Arxiv

5+阅读 · 2018年5月7日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

VIP会员

文章信息

相关主题

相关VIP内容

【2021新书】高阶网络，150页pdf，Higher-Order Networks

【2021新书】高阶网络，150页pdf，Higher-Order Networks

专知会员服务

89+阅读 · 2021年11月26日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling

专知会员服务

63+阅读 · 2020年5月23日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

专知会员服务

58+阅读 · 2020年5月21日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Arxiv

7+阅读 · 2021年9月23日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Global2Local: Efficient Structure Search for Video Action Segmentation

Arxiv

5+阅读 · 2021年1月4日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

4+阅读 · 2019年10月28日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

ECO: Efficient Convolutional Network for Online Video Understanding

Arxiv

5+阅读 · 2018年5月7日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

微信扫码咨询专知VIP会员