探索和改进移动层面的愿景转换器 (Exploring and Improving Mobile Level Vision Transformers) - 专知论文

会员服务 ·

0

INFORMS · Vision · 变换 · DeiT · 块 ·

2021 年 8 月 30 日

Exploring and Improving Mobile Level Vision Transformers

翻译：探索和改进移动层面的愿景转换器

Pengguang Chen,Yixin Chen,Shu Liu,Mingchang Yang,Jiaya Jia

from arxiv, 10 pages; 5 figures; preprint

We study the vision transformer structure in the mobile level in this paper, and find a dramatic performance drop. We analyze the reason behind this phenomenon, and propose a novel irregular patch embedding module and adaptive patch fusion module to improve the performance. We conjecture that the vision transformer blocks (which consist of multi-head attention and feed-forward network) are more suitable to handle high-level information than low-level features. The irregular patch embedding module extracts patches that contain rich high-level information with different receptive fields. The transformer blocks can obtain the most useful information from these irregular patches. Then the processed patches pass the adaptive patch merging module to get the final features for the classifier. With our proposed improvements, the traditional uniform vision transformer structure can achieve state-of-the-art results in mobile level. We improve the DeiT baseline by more than 9\% under the mobile-level settings and surpass other transformer architectures like Swin and CoaT by a large margin.

翻译：我们在本文中研究移动层面的视觉变压器结构, 并发现一个显著的性能下降。我们分析这一现象背后的原因, 并提出一个新的非常规补丁嵌入模块和适应性补充组合模块来改善性能。我们推测, 视觉变压器块( 由多头关注和饲料前向网络组成) 更适合处理高层次信息, 而不是低层次特征。不正常的补丁嵌入模块提取了含有不同可接受字段的丰富高层次信息的补丁。变压器块可以从这些非常规补丁中获得最有用的信息。然后, 加工的补丁通过适应性补装合并模块来获得分类器的最终功能。随着我们提议的改进, 传统统一的视觉变压器结构可以在移动层面实现最新艺术效果。我们在移动级别设置下将DeiT 基线改善9个以上, 并大大超过 Swin 和 CoaT 等其他变压器结构。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

64+阅读 · 2020年2月16日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【泡泡一分钟】高动态环境的语义单目SLAM

【泡泡一分钟】高动态环境的语义单目SLAM

泡泡机器人SLAM

5+阅读 · 2019年3月27日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【泡泡一分钟】视觉惯性里程计的信息稀疏化(IROS-2018)

【泡泡一分钟】视觉惯性里程计的信息稀疏化(IROS-2018)

泡泡机器人SLAM

9+阅读 · 2018年12月31日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

极市平台

4+阅读 · 2018年4月19日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Attention W-Net: Improved Skip Connections for better Representations

Arxiv

0+阅读 · 2021年10月17日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Arxiv

13+阅读 · 2021年7月25日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Exploring Simple Siamese Representation Learning

Arxiv

4+阅读 · 2020年11月20日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Arxiv

7+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

64+阅读 · 2020年2月16日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【泡泡一分钟】高动态环境的语义单目SLAM

【泡泡一分钟】高动态环境的语义单目SLAM

泡泡机器人SLAM

5+阅读 · 2019年3月27日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【泡泡一分钟】视觉惯性里程计的信息稀疏化(IROS-2018)

【泡泡一分钟】视觉惯性里程计的信息稀疏化(IROS-2018)

泡泡机器人SLAM

9+阅读 · 2018年12月31日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

IEEE2018|An Accurate and Real-time 3D Tracking System for Robots

极市平台

4+阅读 · 2018年4月19日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Attention W-Net: Improved Skip Connections for better Representations

Arxiv

0+阅读 · 2021年10月17日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Arxiv

13+阅读 · 2021年7月25日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Exploring Simple Siamese Representation Learning

Arxiv

4+阅读 · 2020年11月20日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Arxiv

7+阅读 · 2018年4月6日

微信扫码咨询专知VIP会员