多规模愿景变异器 (Multiscale Vision Transformers) - 专知论文

会员服务 ·

0

Vision · 变换 · MoDELS · 可约的 · 层 ·

2021 年 4 月 22 日

Multiscale Vision Transformers

翻译：多规模愿景变异器

Haoqi Fan,Bo Xiong,Karttikeya Mangalam,Yanghao Li,Zhicheng Yan,Jitendra Malik,Christoph Feichtenhofer

from arxiv, Technical report

We present Multiscale Vision Transformers (MViT) for video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models. Multiscale Transformers have several channel-resolution scale stages. Starting from the input resolution and a small channel dimension, the stages hierarchically expand the channel capacity while reducing the spatial resolution. This creates a multiscale pyramid of features with early layers operating at high spatial resolution to model simple low-level visual information, and deeper layers at spatially coarse, but complex, high-dimensional features. We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters. We further remove the temporal dimension and apply our model for image classification where it outperforms prior work on vision transformers. Code is available at: https://github.com/facebookresearch/SlowFast

翻译：我们通过将多尺度特征等级与变压器模型连接起来,为视频和图像识别提出多尺度视觉变异器(MViT)的初始概念。多尺度变异器有多个频道分辨率级。从输入分辨率和小频道层面开始,从输入分辨率和小频道层面分级扩展频道容量,同时降低空间分辨率。这创造了一个多尺度的功能金字塔,其早期层以高空间分辨率运行,以模拟简单的低水平视觉信息,以及空间粗糙但复杂、高维特征的更深层。我们评估了这一基本建筑,以模拟各种视频识别任务的视觉信号的密集性质,在这些任务中,它优于依赖大规模外部预培训的同步视觉变异器,在计算和参数方面成本为5-10x。我们进一步删除了时间层面,并在图像分类中应用了我们的模型,在图像分类中它优于先前关于视觉变异器的工作。代码见: https://github.com/facebourresearch/SlowFast/SlowFast。

1

相关内容

Vision

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

Transformer替代CNN？8篇论文概述最新进展！

Transformer替代CNN？8篇论文概述最新进展！

专知会员服务

77+阅读 · 2021年1月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

用Transformer完全替代CNN

用Transformer完全替代CNN

CVer

20+阅读 · 2020年10月23日

已删除

将门创投

3+阅读 · 2019年10月18日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

轻量attention模块：Spatial Group-wise Enhance

轻量attention模块：Spatial Group-wise Enhance

极市平台

15+阅读 · 2019年7月3日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Arxiv

0+阅读 · 2021年6月11日

CAT: Cross Attention in Vision Transformer

CAT: Cross Attention in Vision Transformer

Arxiv

0+阅读 · 2021年6月10日

Patch Slimming for Efficient Vision Transformers

Arxiv

0+阅读 · 2021年6月5日

RegionViT: Regional-to-Local Attention for Vision Transformers

Arxiv

0+阅读 · 2021年6月4日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Arxiv

6+阅读 · 2021年3月17日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

Arxiv

3+阅读 · 2019年6月5日

VIP会员

文章信息

相关主题

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

Transformer替代CNN？8篇论文概述最新进展！

Transformer替代CNN？8篇论文概述最新进展！

专知会员服务

77+阅读 · 2021年1月19日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

用Transformer完全替代CNN

用Transformer完全替代CNN

CVer

20+阅读 · 2020年10月23日

已删除

将门创投

3+阅读 · 2019年10月18日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

轻量attention模块：Spatial Group-wise Enhance

轻量attention模块：Spatial Group-wise Enhance

极市平台

15+阅读 · 2019年7月3日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Arxiv

0+阅读 · 2021年6月11日

CAT: Cross Attention in Vision Transformer

CAT: Cross Attention in Vision Transformer

Arxiv

0+阅读 · 2021年6月10日

Patch Slimming for Efficient Vision Transformers

Arxiv

0+阅读 · 2021年6月5日

RegionViT: Regional-to-Local Attention for Vision Transformers

Arxiv

0+阅读 · 2021年6月4日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Arxiv

6+阅读 · 2021年3月17日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

Arxiv

3+阅读 · 2019年6月5日

微信扫码咨询专知VIP会员