移动愿景变异器的可分离的自我注意 (Separable Self-attention for Mobile Vision Transformers) - 专知论文

会员服务 ·

0

Vision · 分离的 · state-of-the-art · 变换 · 自注意力机制 ·

2022 年 6 月 6 日

Separable Self-attention for Mobile Vision Transformers

翻译：移动愿景变异器的可分离的自我注意

Sachin Mehta,Mohammad Rastegari

from arxiv, Technical report

Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across several mobile vision tasks, including classification and detection. Though these models have fewer parameters, they have high latency as compared to convolutional neural network-based models. The main efficiency bottleneck in MobileViT is the multi-headed self-attention (MHA) in transformers, which requires $O(k^2)$ time complexity with respect to the number of tokens (or patches) $k$. Moreover, MHA requires costly operations (e.g., batch-wise matrix multiplication) for computing self-attention, impacting latency on resource-constrained devices. This paper introduces a separable self-attention method with linear complexity, i.e. $O(k)$. A simple yet effective characteristic of the proposed method is that it uses element-wise operations for computing self-attention, making it a good choice for resource-constrained devices. The improved model, MobileViTv2, is state-of-the-art on several mobile vision tasks, including ImageNet object classification and MS-COCO object detection. With about three million parameters, MobileViTv2 achieves a top-1 accuracy of 75.6% on the ImageNet dataset, outperforming MobileViT by about 1% while running $3.2\times$ faster on a mobile device. Our source code is available at: \url{https://github.com/apple/ml-cvnets}

翻译：移动视觉变压器(MobileViT)可以在包括分类和检测在内的若干移动视觉任务中实现最先进的性能,包括分类和检测。虽然这些模型的参数较少,但它们与以神经神经网络为基础的模型相比具有较高的潜值。 MobileVet 中的主要效率瓶颈是变压器中多头自省(MHA),这需要花费O(k)2美元的时间复杂性,这需要花费在象征(或补丁)数量上。此外,MHA需要花费昂贵的操作(例如批量化矩阵倍增)来计算自控能力,对资源限制装置造成影响。本文介绍了一个具有线性复杂性(即$O(k)美元)的静态自留方法。拟议方法的一个简单而有效的特征是,它使用元素智能操作来计算自留量(或补丁) $k$。此外,MFIVT 改进的模型(MliveVT2) 是一些移动视觉任务上最高级的艺术,包括图像网/内置目标的精确度为75 VILVILO(O)的S-ILVI) 6级数据分类和S-IS-IS-IS-ILVD-ILVLVT) 3S-IS-ILS-ILVDM-ILVLVDMS)

1

相关内容

Vision

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

专知会员服务

37+阅读 · 2020年1月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

一文细数73个Vision transformer家族成员

一文细数73个Vision transformer家族成员

极市平台

0+阅读 · 2022年3月17日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

炎症作用下circ_0007986/miRNA调控食管癌细胞耐药促进肿瘤转移机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

MPC-1乙酰化修饰调控糖代谢影响胰腺癌生长和干性特征的机制

国家自然科学基金

0+阅读 · 2015年12月31日

凋亡诱导因子AIF调控Wnt信号通路的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

局部磁场作用下磁流变弹性体夹层梁振动特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于区域分解和混合有限元的大地电磁三维并行正反演研究

国家自然科学基金

1+阅读 · 2014年12月31日

CX3CL1/CX3CR1信号通路调节胰腺癌糖代谢的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Spire1蛋白在胰腺β细胞胰岛素分泌中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

航空发动机环形燃烧室燃烧与排放物生成的反应动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于局部不变性特征流的相异场景密集匹配

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Arxiv

0+阅读 · 2022年7月22日

Pyramid Transformer for Traffic Sign Detection

Arxiv

0+阅读 · 2022年7月22日

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Arxiv

0+阅读 · 2022年7月21日

Simple Open-Vocabulary Object Detection with Vision Transformers

Arxiv

0+阅读 · 2022年7月20日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

state-of-the-art

自注意力机制

相关VIP内容

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

专知会员服务

37+阅读 · 2020年1月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

一文细数73个Vision transformer家族成员

一文细数73个Vision transformer家族成员

极市平台

0+阅读 · 2022年3月17日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Arxiv

0+阅读 · 2022年7月22日

Pyramid Transformer for Traffic Sign Detection

Arxiv

0+阅读 · 2022年7月22日

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Arxiv

0+阅读 · 2022年7月21日

Simple Open-Vocabulary Object Detection with Vision Transformers

Arxiv

0+阅读 · 2022年7月20日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

相关基金

炎症作用下circ_0007986/miRNA调控食管癌细胞耐药促进肿瘤转移机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

MPC-1乙酰化修饰调控糖代谢影响胰腺癌生长和干性特征的机制

国家自然科学基金

0+阅读 · 2015年12月31日

凋亡诱导因子AIF调控Wnt信号通路的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

局部磁场作用下磁流变弹性体夹层梁振动特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于区域分解和混合有限元的大地电磁三维并行正反演研究

国家自然科学基金

1+阅读 · 2014年12月31日

CX3CL1/CX3CR1信号通路调节胰腺癌糖代谢的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Spire1蛋白在胰腺β细胞胰岛素分泌中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

航空发动机环形燃烧室燃烧与排放物生成的反应动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于局部不变性特征流的相异场景密集匹配

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员