视频变换器网络 (Video Transformer Network) - 专知论文

会员服务 ·

0

视频分类 · 模型评估 · Networking · 推断 · 变换 ·

2021 年 8 月 17 日

Video Transformer Network

翻译：视频变换器网络

Daniel Neimark,Omri Bar,Maya Zohar,Dotan Asselmann

This paper presents VTN, a transformer-based framework for video recognition. Inspired by recent developments in vision transformers, we ditch the standard approach in video action recognition that relies on 3D ConvNets and introduce a method that classifies actions by attending to the entire video sequence information. Our approach is generic and builds on top of any given 2D spatial network. In terms of wall runtime, it trains $16.1\times$ faster and runs $5.1\times$ faster during inference while maintaining competitive accuracy compared to other state-of-the-art methods. It enables whole video analysis, via a single end-to-end pass, while requiring $1.5\times$ fewer GFLOPs. We report competitive results on Kinetics-400 and present an ablation study of VTN properties and the trade-off between accuracy and inference speed. We hope our approach will serve as a new baseline and start a fresh line of research in the video recognition domain. Code and models are available at: https://github.com/bomri/SlowFast/blob/master/projects/vtn/README.md

翻译：本文介绍了VTN, 这是一个基于变压器的视频识别框架。受视觉变压器最新动态的启发, 我们放弃了依赖 3D ConvNet 的视频动作识别标准方法, 并引入了一种通过关注整个视频序列信息对行动进行分类的方法。我们的方法是通用的, 建立在任何给定的 2D 空间网络之上。在墙运行时间方面, 它培训了16.1 美元, 在推断过程中速度更快, 运行了5.1 美元, 同时保持了与其他最新方法相比的竞争性精确度。它通过单一端到端通行证进行全视频分析, 同时需要1.5美元, 并减少GFLOPs 。我们报告了Kinitics- 400 的竞争性结果, 并介绍了VTN属性以及精确度和推导力速度之间的交易性差研究。我们希望我们的方法将作为新的基线, 并在视频识别领域启动新的研究线。代码和模型见: https://github.com/bommri/ SlowFast/blebrob/ must/ must/ must/ must/ vrods.

0

相关内容

视频分类

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

专知会员服务

75+阅读 · 2020年6月14日

【ICML2020】小样本目标检测

【ICML2020】小样本目标检测

专知会员服务

91+阅读 · 2020年6月2日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Graph Neural Network（GNN）最全资源整理分享

Graph Neural Network（GNN）最全资源整理分享

深度学习与NLP

339+阅读 · 2019年7月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

53+阅读 · 2019年4月12日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Network Embedding 指南

Network Embedding 指南

专知

21+阅读 · 2018年8月13日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

多目标的强化学习教程

多目标的强化学习教程

CreateAMind

4+阅读 · 2018年1月25日

Sub-word Level Lip Reading With Visual Attention

Sub-word Level Lip Reading With Visual Attention

Arxiv

0+阅读 · 2021年10月14日

Generative Video Transformer: Can Objects be the Words?

Arxiv

6+阅读 · 2021年7月20日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Text Level Graph Neural Network for Text Classification

Text Level Graph Neural Network for Text Classification

Arxiv

9+阅读 · 2019年10月8日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

Convolutional Self-Attention Network

Arxiv

6+阅读 · 2019年4月8日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

ECO: Efficient Convolutional Network for Online Video Understanding

Arxiv

5+阅读 · 2018年5月7日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

专知会员服务

75+阅读 · 2020年6月14日

【ICML2020】小样本目标检测

【ICML2020】小样本目标检测

专知会员服务

91+阅读 · 2020年6月2日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Graph Neural Network（GNN）最全资源整理分享

Graph Neural Network（GNN）最全资源整理分享

深度学习与NLP

339+阅读 · 2019年7月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

53+阅读 · 2019年4月12日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Network Embedding 指南

Network Embedding 指南

专知

21+阅读 · 2018年8月13日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

多目标的强化学习教程

多目标的强化学习教程

CreateAMind

4+阅读 · 2018年1月25日

相关论文

Sub-word Level Lip Reading With Visual Attention

Sub-word Level Lip Reading With Visual Attention

Arxiv

0+阅读 · 2021年10月14日

Generative Video Transformer: Can Objects be the Words?

Arxiv

6+阅读 · 2021年7月20日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Text Level Graph Neural Network for Text Classification

Text Level Graph Neural Network for Text Classification

Arxiv

9+阅读 · 2019年10月8日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

Convolutional Self-Attention Network

Arxiv

6+阅读 · 2019年4月8日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

ECO: Efficient Convolutional Network for Online Video Understanding

Arxiv

5+阅读 · 2018年5月7日

微信扫码咨询专知VIP会员