FastViT: 利用结构再参数化的快速混合视觉Transformer (FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization) - 专知论文

会员服务 ·

0

混合 · 参数化 · 移动设备 · 鲁棒 · 结构 ·

2023 年 3 月 24 日

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

翻译：FastViT: 利用结构再参数化的快速混合视觉Transformer

Pavan Kumar Anasosalu Vasu,James Gabriel,Jeff Zhu,Oncel Tuzel,Anurag Ranjan

The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network. We further apply train-time overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency. We show that - our model is 3.5x faster than CMT, a recent state-of-the-art hybrid transformer architecture, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset. At similar latency, our model obtains 4.2% better Top-1 accuracy on ImageNet than MobileOne. Our model consistently outperforms competing architectures across several tasks -- image classification, detection, segmentation and 3D mesh regression with significant improvement in latency on both a mobile device and a desktop GPU. Furthermore, our model is highly robust to out-of-distribution samples and corruptions, improving over competing robust models.

翻译：最近，Transformer和卷积设计的结合导致了模型精度和效率的稳定提高。在本文中，我们介绍了FastViT，一种混合视觉Transformer架构，其取得了最先进的时延-精度权衡。为此，我们引入了一种新的令牌混合运算符RepMixer，它是FastViT的构建块，使用结构再参数化降低了网络中的存储器访问成本，通过删除跳跃连接来实现。我们进一步应用训练时超参数和大内核卷积来提高准确性，并且经验上表明这些选择对延迟的影响非常小。我们展示了-我们的模型在移动设备上比CMT最近的最先进的混合Transformer架构快3.5倍，在相同的准确率下比EfficientNet快4.9倍，在相同的延迟下比ConvNeXt快1.9倍。在类似的延迟下，我们的模型比MobileOne在ImageNet上获得了4.2%更好的Top-1准确率。我们的模型在几个任务--图像分类、检测、分割和3D网格回归--中始终优于竞争架构，在移动设备和桌面GPU上具有显着的延迟改进。此外，我们的模型对于分布样本和碎片具有高度的鲁棒性，改进了竞争鲁棒模型。

0

相关内容

Transformer 落地出现 | Next-ViT实现工业TensorRT实时落地，超越ResNet、CSWin

Transformer 落地出现 | Next-ViT实现工业TensorRT实时落地，超越ResNet、CSWin

专知会员服务

22+阅读 · 2022年7月19日

【ECCV2022】UniNet:具有卷积、Transformer和MLP的统一架构搜索

【ECCV2022】UniNet:具有卷积、Transformer和MLP的统一架构搜索

专知会员服务

30+阅读 · 2022年7月15日

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

专知会员服务

25+阅读 · 2022年3月9日

【ICML2021】轻量级结构多样化的网络结构

专知会员服务

28+阅读 · 2021年8月2日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

CVer

1+阅读 · 2022年6月5日

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

GAN生成式对抗网络

18+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

功能多孔有机骨架材料的结构设计与可控合成

国家自然科学基金

0+阅读 · 2015年12月31日

非局部总变差正则化图像恢复模型的快速子空间校正算法

国家自然科学基金

0+阅读 · 2014年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

大变形结构无网格拓扑优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Sclerotiorin类天然产物的结构优化及其杀菌活性和作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

地球物理反演中的混合并行计算方法研究- - 以MT Occam并行反演为例

国家自然科学基金

0+阅读 · 2012年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

抗癌海绵天然产物fascaplysin新衍生物的靶向设计、合成和生物活性评价

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow

Arxiv

0+阅读 · 2023年5月16日

Dataset Distillation Using Parameter Pruning

Arxiv

0+阅读 · 2023年5月16日

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

Arxiv

0+阅读 · 2023年5月15日

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Arxiv

0+阅读 · 2023年5月12日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

Transformer 落地出现 | Next-ViT实现工业TensorRT实时落地，超越ResNet、CSWin

Transformer 落地出现 | Next-ViT实现工业TensorRT实时落地，超越ResNet、CSWin

专知会员服务

22+阅读 · 2022年7月19日

【ECCV2022】UniNet:具有卷积、Transformer和MLP的统一架构搜索

【ECCV2022】UniNet:具有卷积、Transformer和MLP的统一架构搜索

专知会员服务

30+阅读 · 2022年7月15日

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

专知会员服务

25+阅读 · 2022年3月9日

【ICML2021】轻量级结构多样化的网络结构

专知会员服务

28+阅读 · 2021年8月2日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

CVer

1+阅读 · 2022年6月5日

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

GAN生成式对抗网络

18+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow

Arxiv

0+阅读 · 2023年5月16日

Dataset Distillation Using Parameter Pruning

Arxiv

0+阅读 · 2023年5月16日

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

Arxiv

0+阅读 · 2023年5月15日

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Smoothed empirical likelihood estimation and automatic variable selection for an expectile high-dimensional model with possibly missing response variable

Arxiv

0+阅读 · 2023年5月12日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

相关基金

功能多孔有机骨架材料的结构设计与可控合成

国家自然科学基金

0+阅读 · 2015年12月31日

非局部总变差正则化图像恢复模型的快速子空间校正算法

国家自然科学基金

0+阅读 · 2014年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

大变形结构无网格拓扑优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Sclerotiorin类天然产物的结构优化及其杀菌活性和作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

地球物理反演中的混合并行计算方法研究- - 以MT Occam并行反演为例

国家自然科学基金

0+阅读 · 2012年12月31日

CPU/GPGPU紧耦合异构多核系统共享Last Level Cache优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

抗癌海绵天然产物fascaplysin新衍生物的靶向设计、合成和生物活性评价

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员