TAda! 用于视频理解的暂时性适应性革命 (TAda! Temporally-Adaptive Convolutions for Video Understanding) - 专知论文

会员服务 ·

0

卷积 · Weight · MoDELS · 可理解性 · 核化 ·

2021 年 11 月 24 日

TAda! Temporally-Adaptive Convolutions for Video Understanding

翻译：TAda! 用于视频理解的暂时性适应性革命

Ziyuan Huang,Shiwei Zhang,Liang Pan,Zhiwu Qing,Mingqian Tang,Ziwei Liu,Marcelo H. Ang Jr

Spatial convolutions are widely used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, i.e., using shared weights for every location in different frames. This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos. Specifically, TAdaConv empowers the spatial convolutions with temporal modelling abilities by calibrating the convolution weights for each frame according to its local and global temporal context. Compared to previous temporal modelling operations, TAdaConv is more efficient as it operates over the convolution kernels instead of the features, whose dimension is an order of magnitude smaller than the spatial resolutions. Further, the kernel calibration also brings an increased model capacity. We construct TAda2D networks by replacing the spatial convolutions in ResNet with TAdaConv, which leads to on par or better performance compared to state-of-the-art approaches on multiple video action recognition and localization benchmarks. We also demonstrate that as a readily plug-in operation with negligible computation overhead, TAdaConv can effectively improve many existing video models with a convincing margin. Codes and models are available at https://github.com/alibaba-mmai-research/pytorch-video-understanding.

翻译：在许多深层视频模型中广泛使用空间空间共变。它基本上假设时空差异, 即使用不同框架中每个位置的共享权重。这项工作展示了用于视频理解的“ 时间- 适应性进化” (Tada Conv) 视频理解, 表明时间层面的适应性权重校准是便利模拟视频中复杂时间动态的有效方法。具体地说, TAda Conv 通过根据每个框架的本地和全球时间背景校准其时间建模能力, 赋予空间共变能力以时间性变异能力。与以往的时间建模操作相比, TAda Conv( TAda Conv) 效率更高, 因为它在变动内圈而不是特征上运行, 其尺寸小于空间分辨率。此外, 内核校准还带来更大的模型能力。我们通过将ResNet的空间共变换成TAda2D网络, 从而使得空间共变异性与多视频动作识别和本地化基准相比, 与现有图像变频模型可以有效地改进。

0

相关内容

在数学（特别是功能分析）中，卷积是对两个函数（f和g）的数学运算，产生三个函数，表示第一个函数的形状如何被另一个函数修改。卷积一词既指结果函数，又指计算结果的过程。它定义为两个函数的乘积在一个函数反转和移位后的积分。并针对所有shift值评估积分，从而生成卷积函数。

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【斯坦福】分布式算法与优化，118页pdf

专知会员服务

82+阅读 · 2020年12月22日

系列教程GNN-algorithms之六：《多核卷积拓扑图—TAGCN》

系列教程GNN-algorithms之六：《多核卷积拓扑图—TAGCN》

专知会员服务

50+阅读 · 2020年8月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ICLR 2020会议的16篇最佳深度学习论文

ICLR 2020会议的16篇最佳深度学习论文

AINLP

5+阅读 · 2020年5月12日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019 | 人脸聚类——Linkage Based Face Clustering via GCN

CVPR2019 | 人脸聚类——Linkage Based Face Clustering via GCN

极市平台

62+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

统计学习与视觉计算组

9+阅读 · 2018年10月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

机器学习研究会

9+阅读 · 2017年11月8日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Deep Networks for Image and Video Super-Resolution

Arxiv

0+阅读 · 2022年1月28日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

Deep Contextual Video Compression

Arxiv

5+阅读 · 2021年9月30日

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Arxiv

7+阅读 · 2021年9月23日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

4+阅读 · 2019年10月28日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

Linkage Based Face Clustering via Graph Convolution Network

Arxiv

16+阅读 · 2019年3月27日

ECO: Efficient Convolutional Network for Online Video Understanding

Arxiv

5+阅读 · 2018年5月7日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

VIP会员

文章信息

相关主题

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【斯坦福】分布式算法与优化，118页pdf

专知会员服务

82+阅读 · 2020年12月22日

系列教程GNN-algorithms之六：《多核卷积拓扑图—TAGCN》

系列教程GNN-algorithms之六：《多核卷积拓扑图—TAGCN》

专知会员服务

50+阅读 · 2020年8月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

ICLR 2020会议的16篇最佳深度学习论文

ICLR 2020会议的16篇最佳深度学习论文

AINLP

5+阅读 · 2020年5月12日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019 | 人脸聚类——Linkage Based Face Clustering via GCN

CVPR2019 | 人脸聚类——Linkage Based Face Clustering via GCN

极市平台

62+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

统计学习与视觉计算组

9+阅读 · 2018年10月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

机器学习研究会

9+阅读 · 2017年11月8日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Deep Networks for Image and Video Super-Resolution

Arxiv

0+阅读 · 2022年1月28日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

3+阅读 · 2021年10月14日

Deep Contextual Video Compression

Arxiv

5+阅读 · 2021年9月30日

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Arxiv

7+阅读 · 2021年9月23日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

4+阅读 · 2019年10月28日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

Linkage Based Face Clustering via Graph Convolution Network

Arxiv

16+阅读 · 2019年3月27日

ECO: Efficient Convolutional Network for Online Video Understanding

Arxiv

5+阅读 · 2018年5月7日

A Read-Write Memory Network for Movie Story Understanding

Arxiv

5+阅读 · 2018年3月16日

微信扫码咨询专知VIP会员