革命的嵌入使上层的视野变强 (Convolutional Embedding Makes Hierarchical Vision Transformer Stronger) - 专知论文

会员服务 ·

0

归纳偏好 · Vision · Performer · 有偏 · state-of-the-art ·

2022 年 8 月 1 日

Convolutional Embedding Makes Hierarchical Vision Transformer Stronger

翻译：革命的嵌入使上层的视野变强

Cong Wang,Hongmin Xu,Xiong Zhang,Li Wang,Zhitong Zheng,Haifeng Liu

from arxiv, ECCV 2022

Vision Transformers (ViTs) have recently dominated a range of computer vision tasks, yet it suffers from low training data efficiency and inferior local semantic representation capability without appropriate inductive bias. Convolutional neural networks (CNNs) inherently capture regional-aware semantics, inspiring researchers to introduce CNNs back into the architecture of the ViTs to provide desirable inductive bias for ViTs. However, is the locality achieved by the micro-level CNNs embedded in ViTs good enough? In this paper, we investigate the problem by profoundly exploring how the macro architecture of the hybrid CNNs/ViTs enhances the performances of hierarchical ViTs. Particularly, we study the role of token embedding layers, alias convolutional embedding (CE), and systemically reveal how CE injects desirable inductive bias in ViTs. Besides, we apply the optimal CE configuration to 4 recently released state-of-the-art ViTs, effectively boosting the corresponding performances. Finally, a family of efficient hybrid CNNs/ViTs, dubbed CETNets, are released, which may serve as generic vision backbones. Specifically, CETNets achieve 84.9% Top-1 accuracy on ImageNet-1K (training from scratch), 48.6% box mAP on the COCO benchmark, and 51.6% mIoU on the ADE20K, substantially improving the performances of the corresponding state-of-the-art baselines.

翻译：视觉变异器(ViTs)最近主导了一系列计算机视觉任务,然而,它却受到培训数据效率低和本地语义代表能力低的困扰,而没有适当的感化偏差。进化神经网络(CNNs)内在地捕捉区域觉悟的语义学,激励研究人员将CNN重新引入ViTs架构,为ViTs提供可取的感化偏差。然而,在ViTs中嵌入的微级CNNs所达到的位置是否足够好? 在本文中,我们深入探讨这一问题,探讨混合CNN/ViTs的宏观结构如何增强等级ViTs的性能。特别是,我们研究象征性嵌入层的作用,别名革命嵌入(CE),并系统地揭示CE的投射点如何适合ViTs的感化偏向性偏差。此外,我们把最佳的CEE配置适用于最近发布的4个状态20级的CNNs/ViTs, Dubbed CETs-deal-nets, drientalalalal nets, 84 K-stal ASal ASal ASirimal krifriews.

0

相关内容

归纳偏好

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

公共就业规模和结构优化的机理与模型

国家自然科学基金

0+阅读 · 2014年12月31日

支气管上皮细胞klotho表达在慢性阻塞性肺气肿形成中作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Diels-Alder反应的可逆交联芳香族聚酰胺及其碳纳米复合材料的制备、表征及性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

DGKε/SNARE信号通路在糖尿病肾病足细胞胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

marcks蛋白家族在斑马鱼背腹轴形成中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

肿瘤细胞中凋亡抑制蛋白CFLAR乙酰化调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

湖库藻类水华形成机理建模与预测方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

富氮介孔炭材料协同低温等离子体脱除煤基气中含硫化合物的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

三波段双极化共用口径SAR天线阵的研究

国家自然科学基金

0+阅读 · 2008年12月31日

NOD蛋白在不可分型流感嗜血杆菌诱导肺组织炎症反应中的作用及相关信号通路研究

国家自然科学基金

0+阅读 · 2008年12月31日

ResT V2: Simpler, Faster and Stronger

Arxiv

0+阅读 · 2022年9月27日

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Arxiv

0+阅读 · 2022年9月26日

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

Arxiv

0+阅读 · 2022年9月23日

Hierarchical Graph Convolutional Network Built by Multiscale Atlases for Brain Disorder Diagnosis Using Functional Connectivity

Arxiv

0+阅读 · 2022年9月22日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

14+阅读 · 2018年6月26日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

ResT V2: Simpler, Faster and Stronger

Arxiv

0+阅读 · 2022年9月27日

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Arxiv

0+阅读 · 2022年9月26日

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

Arxiv

0+阅读 · 2022年9月23日

Hierarchical Graph Convolutional Network Built by Multiscale Atlases for Brain Disorder Diagnosis Using Functional Connectivity

Arxiv

0+阅读 · 2022年9月22日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

14+阅读 · 2018年6月26日

相关基金

公共就业规模和结构优化的机理与模型

国家自然科学基金

0+阅读 · 2014年12月31日

支气管上皮细胞klotho表达在慢性阻塞性肺气肿形成中作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Diels-Alder反应的可逆交联芳香族聚酰胺及其碳纳米复合材料的制备、表征及性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

DGKε/SNARE信号通路在糖尿病肾病足细胞胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

marcks蛋白家族在斑马鱼背腹轴形成中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

肿瘤细胞中凋亡抑制蛋白CFLAR乙酰化调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

湖库藻类水华形成机理建模与预测方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

富氮介孔炭材料协同低温等离子体脱除煤基气中含硫化合物的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

三波段双极化共用口径SAR天线阵的研究

国家自然科学基金

0+阅读 · 2008年12月31日

NOD蛋白在不可分型流感嗜血杆菌诱导肺组织炎症反应中的作用及相关信号通路研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员