SemMAE: 为学习的蒙面自动编码器提供语义辅助遮罩 (SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders) - 专知论文

会员服务 ·

0

Learning · 掩码 · 掩码自编码MAE · 自编码器 · Integration ·

2022 年 6 月 21 日

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

翻译：SemMAE: 为学习的蒙面自动编码器提供语义辅助遮罩

Gang Li,Heliang Zheng,Daqing Liu,Bing Su,Changwen Zheng

from arxiv, 12 pages, 3 figures

Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance.

翻译：最近,在蒙面图像建模以赶上遮面语言建模方面取得了显著进展。然而,与NLP中的文字不同的是,图像的语义分解缺乏语义分解仍然使得遮面自动编码(MAE)在视觉和语言之间有所差异。在本论文中,我们探索了一种潜在的视觉模拟词词,即语义部分,我们通过提出一种语义指南的遮罩策略,将语义信息纳入MAE的培训过程。与广泛采用的随机掩码改进相比,我们的遮面战略可以逐渐指导网络学习各种信息,即从部内模式到部分之间的关系。特别是,我们在两个步骤中实现了这个目的。 1) 语义部分学习了:我们设计了一种自我监督的部分学习方法,通过利用和完善基于 ViT 的代言方的多头注意力,将语义信息纳入MAE(SemMAE) 培训:我们设计一种国家掩码战略,从隐藏图像的精度部分到图像的精度部分,通过将SemMA- 图像的精度整合一个部分。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

专知会员服务

62+阅读 · 2021年7月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

用于治疗2型糖尿病的PPARα/γ双重激动剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

离子液体非水溶致液晶的构建及萃取分离性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

水流－化学耦合作用下结构性重金属污染土的宏微观动力学特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

主题模型建模框架下的高分辨率遥感影像半监督分类研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于HPLC-TPMT EAD-MS联用技术的天芪降糖胶囊中TPMT酶亲和活性成分研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于靶点拓扑异构酶和PARP1的吖啶类化合物的合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

地下水中痕量卤素形态分析研究

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

利用主链型偶氮苯液晶性嵌段共聚物构筑层状相分离液晶弹形体薄膜

国家自然科学基金

0+阅读 · 2009年12月31日

α65293;硫辛酸防治2型糖尿病并发症及其线粒体修复机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Unsupervised Domain Adaptation for Point Cloud Semantic Segmentation via Graph Matching

Arxiv

0+阅读 · 2022年8月9日

Exploiting Shape Cues for Weakly Supervised Semantic Segmentation

Arxiv

0+阅读 · 2022年8月8日

Towards Semantic Communications: Deep Learning-Based Image Semantic Coding

Arxiv

0+阅读 · 2022年8月8日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

VIP会员

文章信息

相关主题

掩码自编码MAE

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

20篇「ICCV2021 Oral」最新论文抢先看！看当下计算机视觉在研究什么？

专知会员服务

62+阅读 · 2021年7月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

Unsupervised Domain Adaptation for Point Cloud Semantic Segmentation via Graph Matching

Arxiv

0+阅读 · 2022年8月9日

Exploiting Shape Cues for Weakly Supervised Semantic Segmentation

Arxiv

0+阅读 · 2022年8月8日

Towards Semantic Communications: Deep Learning-Based Image Semantic Coding

Arxiv

0+阅读 · 2022年8月8日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

相关基金

用于治疗2型糖尿病的PPARα/γ双重激动剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

离子液体非水溶致液晶的构建及萃取分离性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

水流－化学耦合作用下结构性重金属污染土的宏微观动力学特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

主题模型建模框架下的高分辨率遥感影像半监督分类研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于HPLC-TPMT EAD-MS联用技术的天芪降糖胶囊中TPMT酶亲和活性成分研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于靶点拓扑异构酶和PARP1的吖啶类化合物的合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

地下水中痕量卤素形态分析研究

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

利用主链型偶氮苯液晶性嵌段共聚物构筑层状相分离液晶弹形体薄膜

国家自然科学基金

0+阅读 · 2009年12月31日

α65293;硫辛酸防治2型糖尿病并发症及其线粒体修复机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员