VOLO:视觉识别远景展望 (VOLO: Vision Outlooker for Visual Recognition) - 专知论文

会员服务 ·

0

Performer · Vision · 验证集 · MoDELS · 模型评估 ·

2021 年 6 月 28 日

VOLO: Vision Outlooker for Visual Recognition

翻译：VOLO:视觉识别远景展望

Li Yuan,Qibin Hou,Zihang Jiang,Jiashi Feng,Shuicheng Yan

from arxiv, code: https://github.com/sail-sg/volo

Visual recognition has been dominated by convolutional neural networks (CNNs) for years. Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided. In this work, we try to close the performance gap and demonstrate that attention-based models are indeed able to outperform CNNs. We find a major factor limiting the performance of ViTs for ImageNet classification is their low efficacy in encoding fine-level features into the token representations. To resolve this, we introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO). Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens, which is shown to be critically beneficial to recognition performance but largely ignored by the self-attention. Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark, without using any extra training data In addition, the pre-trained VOLO transfers well to downstream tasks, such as semantic segmentation. We achieve 84.3% mIoU score on the cityscapes validation set and 54.3% on the ADE20K validation set. Code is available at \url{https://github.com/sail-sg/volo}.

翻译：多年来,视觉认知一直以神经神经网络(CNNs)为主。尽管最近流行的视觉变压器(ViTs)在图像网络分类中表现出了以自我关注为基础的模型的巨大潜力,但是,如果不提供额外数据,其性能仍然低于最新的SOTA有线电视新闻网。在这项工作中,我们试图缩小性能差距,并表明以关注为基础的模型确实能够超过CNN。我们发现一个限制VITs图像网络分类的性能的主要因素是它们将微调级功能编码的低效率。为了解决这个问题,我们引入了新的展望关注,并展示了一个简单和一般的架构,称为VivisionOutorer(VOLO)。与侧重于全球依赖型模型的自我关注程度相比,没有提供额外数据,而是将精细化的功能和背景编成符号。我们VOLOV20网络分类的可用性能为87.1%的顶级/一级精确度。这是第一个模型,在竞争力基准上超过87%的市级数据转换,没有使用任何额外的标准。

1

相关内容

Performer

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

专知会员服务

10+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

ActivityNet Challenge 2017 冠军方案分享

ActivityNet Challenge 2017 冠军方案分享

极市平台

4+阅读 · 2017年7月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

Arxiv

0+阅读 · 2021年8月26日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Involution: Inverting the Inherence of Convolution for Visual Recognition

Arxiv

6+阅读 · 2021年3月10日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Contrastive Multi-View Representation Learning on Graphs

Arxiv

5+阅读 · 2020年6月10日

Adversarial Examples Improve Image Recognition

Arxiv

4+阅读 · 2019年11月21日

Compact Trilinear Interaction for Visual Question Answering

Arxiv

6+阅读 · 2019年9月26日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Graph Convolutional Networks for Named Entity Recognition

Arxiv

17+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

相关VIP内容

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

专知会员服务

10+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

ActivityNet Challenge 2017 冠军方案分享

ActivityNet Challenge 2017 冠军方案分享

极市平台

4+阅读 · 2017年7月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

Arxiv

0+阅读 · 2021年8月26日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Involution: Inverting the Inherence of Convolution for Visual Recognition

Arxiv

6+阅读 · 2021年3月10日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Contrastive Multi-View Representation Learning on Graphs

Arxiv

5+阅读 · 2020年6月10日

Adversarial Examples Improve Image Recognition

Arxiv

4+阅读 · 2019年11月21日

Compact Trilinear Interaction for Visual Question Answering

Arxiv

6+阅读 · 2019年9月26日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Graph Convolutional Networks for Named Entity Recognition

Arxiv

17+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员