多规模愿景长者:高分辨率图像编码的新愿景转换器 (Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding) - 专知论文

会员服务 ·

0

Vision · 变换 · MoDELS · Pyramid · 图片分类 ·

2021 年 5 月 27 日

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

翻译：多规模愿景长者:高分辨率图像编码的新愿景转换器

Pengchuan Zhang,Xiyang Dai,Jianwei Yang,Bin Xiao,Lu Yuan,Lei Zhang,Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques. The first is the multi-scale model structure, which provides image encodings at multiple scales with manageable computational cost. The second is the attention mechanism of vision Longformer, which is a variant of Longformer \cite{beltagy2020longformer}, originally developed for natural language processing, and achieves a linear complexity w.r.t. the number of input tokens. A comprehensive empirical study shows that the new ViT significantly outperforms several strong baselines, including the existing ViT models and their ResNet counterparts, and the Pyramid Vision Transformer from a concurrent work \cite{wang2021pyramid}, on a range of vision tasks, including image classification, object detection, and segmentation. The models and source code are released at \url{https://github.com/microsoft/vision-longformer}.

翻译：本文介绍了一个新的视野变换器(VIT)架构多范围愿景长征,它大大加强了使用两种技术对高分辨率图像进行编码的 VIT\ cite{dosovitskiy202020image} 的 VIT, 使用两种技术对高清晰度图像进行编码。首先是多尺度模型结构, 提供多种比例的图像编码, 且计算成本可以控制。第二是视野变换器Longexe的注意机制, 这是一种为自然语言处理而开发的变体, 并实现了线性复杂度( w.r.t. ) 输入符号的数量。一项全面的经验研究表明, 新的 VIT 明显超越了几个强大的基线, 包括现有的 VIT 模型及其 ResNet 对应方, 以及同时工作的 Pyramidrimid 视野变体, 包括图像分类、对象探测和分区。模型和源代码发布在\url{https://github.com/microcrosoft/vision-Longsurent} 。

4

相关内容

Vision

【CVPR2021】预训练图像处理Transformer

专知会员服务

45+阅读 · 2021年6月1日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

64+阅读 · 2020年2月16日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

CVPR2019年热门论文及开源代码分享

CVPR2019年热门论文及开源代码分享

深度学习与NLP

7+阅读 · 2019年6月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019| 05-08更新12篇论文及代码合集（1篇oral，含医学图像分割/显著性检测/数据集等）

CVPR2019| 05-08更新12篇论文及代码合集（1篇oral，含医学图像分割/显著性检测/数据集等）

极市平台

19+阅读 · 2019年5月8日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

语义分割+视频分割开源代码集合

语义分割+视频分割开源代码集合

极市平台

35+阅读 · 2018年3月5日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Arxiv

0+阅读 · 2021年7月15日

CMT: Convolutional Neural Networks Meet Vision Transformers

Arxiv

1+阅读 · 2021年7月15日

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Arxiv

0+阅读 · 2021年7月14日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

Arxiv

7+阅读 · 2020年6月11日

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

Arxiv

6+阅读 · 2020年3月19日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

OmniNet: A unified architecture for multi-modal multi-task learning

OmniNet: A unified architecture for multi-modal multi-task learning

Arxiv

6+阅读 · 2019年7月17日

Adversarial TableQA: Attention Supervision for Question Answering on Tables

Arxiv

4+阅读 · 2018年10月18日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2021】预训练图像处理Transformer

专知会员服务

45+阅读 · 2021年6月1日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

64+阅读 · 2020年2月16日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

CVPR2019年热门论文及开源代码分享

CVPR2019年热门论文及开源代码分享

深度学习与NLP

7+阅读 · 2019年6月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019| 05-08更新12篇论文及代码合集（1篇oral，含医学图像分割/显著性检测/数据集等）

CVPR2019| 05-08更新12篇论文及代码合集（1篇oral，含医学图像分割/显著性检测/数据集等）

极市平台

19+阅读 · 2019年5月8日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

语义分割+视频分割开源代码集合

语义分割+视频分割开源代码集合

极市平台

35+阅读 · 2018年3月5日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

相关论文

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Arxiv

0+阅读 · 2021年7月15日

CMT: Convolutional Neural Networks Meet Vision Transformers

Arxiv

1+阅读 · 2021年7月15日

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Arxiv

0+阅读 · 2021年7月14日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

High-Performance Large-Scale Image Recognition Without Normalization

Arxiv

5+阅读 · 2021年2月11日

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

Arxiv

7+阅读 · 2020年6月11日

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

Arxiv

6+阅读 · 2020年3月19日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

OmniNet: A unified architecture for multi-modal multi-task learning

OmniNet: A unified architecture for multi-modal multi-task learning

Arxiv

6+阅读 · 2019年7月17日

Adversarial TableQA: Attention Supervision for Question Answering on Tables

Arxiv

4+阅读 · 2018年10月18日

微信扫码咨询专知VIP会员