BeiT v 2: 使用矢量定量视觉收缩器进行遮盖图像建模 (BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers) - 专知论文

会员服务 ·

0

BEiT · 掩码 · 词元分析器 · 模型评估 · MoDELS ·

2022 年 8 月 12 日

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

翻译：BeiT v 2: 使用矢量定量视觉收缩器进行遮盖图像建模

Zhiliang Peng,Li Dong,Hangbo Bao,Qixiang Ye,Furu Wei

Masked image modeling (MIM) has demonstrated impressive results in self-supervised representation learning by recovering corrupted image patches. However, most methods still operate on low-level image pixels, which hinders the exploitation of high-level semantics for representation models. In this study, we propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction, providing a systematic way to promote MIM from pixel-level to semantic-level. Specifically, we introduce vector-quantized knowledge distillation to train the tokenizer, which discretizes a continuous semantic space to compact codes. We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches. Moreover, we encourage the model to explicitly aggregate patch information into a global image representation, which facilities linear probing. Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods. On ImageNet-1K (224 size), the base-size BEiT v2 achieves 85.5% top-1 accuracy for fine-tuning and 80.1% top-1 accuracy for linear probing. The large-size BEiT v2 obtains 87.3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56.7% mIoU on ADE20K for semantic segmentation. The code and pretrained models are available at https://aka.ms/beit.

翻译：遮蔽图像模型( MIM) 通过恢复被腐蚀的图像补丁, 在自我监督的代表学习中展示了令人印象深刻的成果。然而, 大多数方法仍然在低层次的图像像素上运行, 这阻碍了对高层次图像模型的利用。在此研究中, 我们提议使用一个语义丰富的视觉象征器作为遮蔽预测的重建目标, 提供一个系统化的方法, 从像素层次到语义层次, 推广MIM。具体地说, 我们引入矢量定量的知识蒸馏法, 以训练代号器, 该代号将连续的语义空间分解到紧凑的代码中。我们随后通过预测隐藏图像模型模型的原始视觉符号来阻碍对高级图像模型的利用。此外, 我们鼓励该模型将精密的拼凑信息作为全球图像图象显示的重建目标, 用于进行线性观测。关于图像分类和语义分类的实验显示, 我们的方法比MIM方法都超越了20 。在图像Net-1K (224), 基级的 BeiT v2 达到85.5% 顶级的顶端- 1 用于精确度精确度和高级 K- 1 级高级的级的级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级

0

相关内容

BEiT

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

Tisp40在肾缺血再灌注损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

舰船泵-桨-舵鲁棒智能协调控制技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

新型杂化介孔光催化材料的制备及其降解大气污染物的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型聚芴膜的构筑及对空气和水中芳烃类爆炸物的检测

国家自然科学基金

0+阅读 · 2013年12月31日

复杂形体时空动态变化生成技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于感知哈希的视频篡改快速检测与多粒度定位技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

中文自动口语摘要技术研究

国家自然科学基金

1+阅读 · 2011年12月31日

碳纳米管/泡沫炭复合材料制备、结构及增强机理

国家自然科学基金

0+阅读 · 2011年12月31日

阿部鲻鰕虎生物标记物对河口POPs污染的预警

国家自然科学基金

0+阅读 · 2009年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

Real-World Robot Learning with Masked Visual Pre-training

Arxiv

0+阅读 · 2022年10月6日

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Arxiv

0+阅读 · 2022年10月5日

CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

Arxiv

0+阅读 · 2022年10月4日

Masked Supervised Learning for Semantic Segmentation

Arxiv

0+阅读 · 2022年10月3日

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Arxiv

0+阅读 · 2022年9月30日

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Arxiv

0+阅读 · 2022年9月29日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Real-World Robot Learning with Masked Visual Pre-training

Arxiv

0+阅读 · 2022年10月6日

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Arxiv

0+阅读 · 2022年10月5日

CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

Arxiv

0+阅读 · 2022年10月4日

Masked Supervised Learning for Semantic Segmentation

Arxiv

0+阅读 · 2022年10月3日

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Arxiv

0+阅读 · 2022年9月30日

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Arxiv

0+阅读 · 2022年9月29日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

相关基金

Tisp40在肾缺血再灌注损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

舰船泵-桨-舵鲁棒智能协调控制技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

新型杂化介孔光催化材料的制备及其降解大气污染物的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型聚芴膜的构筑及对空气和水中芳烃类爆炸物的检测

国家自然科学基金

0+阅读 · 2013年12月31日

复杂形体时空动态变化生成技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于感知哈希的视频篡改快速检测与多粒度定位技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

中文自动口语摘要技术研究

国家自然科学基金

1+阅读 · 2011年12月31日

碳纳米管/泡沫炭复合材料制备、结构及增强机理

国家自然科学基金

0+阅读 · 2011年12月31日

阿部鲻鰕虎生物标记物对河口POPs污染的预警

国家自然科学基金

0+阅读 · 2009年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员