HIVT: 等级式愿景变异器与面具图像建模相匹配 (HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling) - 专知论文

会员服务 ·

0

Vision · 变换 · Performer · Swin Transformer · 掩码 ·

2022 年 5 月 30 日

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

翻译：HIVT: 等级式愿景变异器与面具图像建模相匹配

Xiaosong Zhang,Yunjie Tian,Wei Huang,Qixiang Ye,Qi Dai,Lingxi Xie,Qi Tian

Recently, masked image modeling (MIM) has offered a new methodology of self-supervised pre-training of vision transformers. A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e.g., ViT), albeit hierarchical vision transformers (e.g., Swin Transformer) have potentially better properties in formulating vision inputs. In this paper, we offer a new design of hierarchical vision transformers named HiViT (short for Hierarchical ViT) that enjoys both high efficiency and good performance in MIM. The key is to remove the unnecessary "local inter-unit operations", deriving structurally simple hierarchical vision transformers in which mask-units can be serialized like plain vision transformers. For this purpose, we start with Swin Transformer and (i) set the masking unit size to be the token size in the main stage of Swin Transformer, (ii) switch off inter-unit self-attentions before the main stage, and (iii) eliminate all operations after the main stage. Empirical studies demonstrate the advantageous performance of HiViT in terms of fully-supervised, self-supervised, and transfer learning. In particular, in running MAE on ImageNet-1K, HiViT-B reports a +0.6% accuracy gain over ViT-B and a 1.9$\times$ speed-up over Swin-B, and the performance gain generalizes to downstream tasks of detection and segmentation. Code will be made publicly available.

翻译：最近,蒙面图像建模(MIM)为视觉变异器的自我监督前训练提供了一个新方法。高效实施的一个关键想法是丢弃整个目标网络(编码器)的遮面图像补丁(或图示),它要求编码器成为简单的视觉变异器(例如ViT),尽管高层次的视觉变异器(例如Swin变异器)在形成视觉输入时具有更好的特性。在本文中,我们提供了一个新的名为HiVIT(高级智能变异器)的高级网络变异器设计,在MIM中既具有较高的效率和良好的性能。关键是要消除不必要的“当地跨单位操作”的掩面图像补印(或图示器),在这种变异形变变变变器中,虽然高层次的变异器(例如Swin变异器)具有更好的特性。为此,我们从Swin变异器开始,并(一)将掩码单位大小设定为Swin变异器的主要阶段的象征值,(二)在主阶段前将内部自留值转换为高精度的自我,在单位间操作中,在SVialS-S-S-VialS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-

0

相关内容

Vision

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

酪氨酸激酶受体配体ERBB4/NRG1突变影响胃癌信号通路及药物反应的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高血压患者Corin基因变异对其蛋白结构及酶功能影响的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维稀疏统计模型中的变量选择与检验

国家自然科学基金

1+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

纠缠及纠缠之外的量子关联刻画

国家自然科学基金

0+阅读 · 2013年12月31日

SIRPa对心肌肥厚的影响及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基因1b型不同准种HCV核心蛋白在TRAIL诱导凋亡通路中的差异及CK1α的作用

国家自然科学基金

0+阅读 · 2012年12月31日

中国人局灶节段肾小球硬化症致病基因INF2突变及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

DWI监测RNAi沉默AQP4治疗脑缺血半暗带的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

胶东半岛土地利用变化对近岸海域陆源非点源污染的影响效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Arxiv

0+阅读 · 2022年7月18日

Rethinking Alignment in Video Super-Resolution Transformers

Arxiv

1+阅读 · 2022年7月18日

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Arxiv

0+阅读 · 2022年7月18日

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

Arxiv

0+阅读 · 2022年7月18日

Posterior Regularization on Bayesian Hierarchical Mixture Clustering

Arxiv

0+阅读 · 2022年7月18日

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Arxiv

0+阅读 · 2022年7月15日

Rethinking Attention Mechanism in Time Series Classification

Arxiv

0+阅读 · 2022年7月14日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

Swin Transformer

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《科研智能：人工智能赋能工业仿真研究报告（2025年）》

具身智能中的世界模型：全面综述

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【博士论文】用于排序与扩散模型的安全、高效与鲁棒强化学习

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Arxiv

0+阅读 · 2022年7月18日

Rethinking Alignment in Video Super-Resolution Transformers

Arxiv

1+阅读 · 2022年7月18日

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Arxiv

0+阅读 · 2022年7月18日

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

Arxiv

0+阅读 · 2022年7月18日

Posterior Regularization on Bayesian Hierarchical Mixture Clustering

Arxiv

0+阅读 · 2022年7月18日

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Arxiv

0+阅读 · 2022年7月15日

Rethinking Attention Mechanism in Time Series Classification

Arxiv

0+阅读 · 2022年7月14日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

酪氨酸激酶受体配体ERBB4/NRG1突变影响胃癌信号通路及药物反应的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高血压患者Corin基因变异对其蛋白结构及酶功能影响的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维稀疏统计模型中的变量选择与检验

国家自然科学基金

1+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

纠缠及纠缠之外的量子关联刻画

国家自然科学基金

0+阅读 · 2013年12月31日

SIRPa对心肌肥厚的影响及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基因1b型不同准种HCV核心蛋白在TRAIL诱导凋亡通路中的差异及CK1α的作用

国家自然科学基金

0+阅读 · 2012年12月31日

中国人局灶节段肾小球硬化症致病基因INF2突变及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

DWI监测RNAi沉默AQP4治疗脑缺血半暗带的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

胶东半岛土地利用变化对近岸海域陆源非点源污染的影响效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员