深入学习掩模图像模型的难样本挖掘 (Hard Patches Mining for Masked Image Modeling) - 专知论文

会员服务 ·

0

损失 · 图像模型 · 重建 · 掩码 · 有效性 ·

2023 年 4 月 12 日

Hard Patches Mining for Masked Image Modeling

翻译：深入学习掩模图像模型的难样本挖掘

Haochen Wang,Kaiyou Song,Junsong Fan,Yuxi Wang,Jin Xie,Zhaoxiang Zhang

from arxiv, Accepted to CVPR 2023

Masked image modeling (MIM) has attracted much research attention due to its promising potential for learning scalable visual representations. In typical approaches, models usually focus on predicting specific contents of masked patches, and their performances are highly related to pre-defined mask strategies. Intuitively, this procedure can be considered as training a student (the model) on solving given problems (predict masked patches). However, we argue that the model should not only focus on solving given problems, but also stand in the shoes of a teacher to produce a more challenging problem by itself. To this end, we propose Hard Patches Mining (HPM), a brand-new framework for MIM pre-training. We observe that the reconstruction loss can naturally be the metric of the difficulty of the pre-training task. Therefore, we introduce an auxiliary loss predictor, predicting patch-wise losses first and deciding where to mask next. It adopts a relative relationship learning strategy to prevent overfitting to exact reconstruction loss values. Experiments under various settings demonstrate the effectiveness of HPM in constructing masked images. Furthermore, we empirically find that solely introducing the loss prediction objective leads to powerful representations, verifying the efficacy of the ability to be aware of where is hard to reconstruct.

翻译：掩模图像模型 (MIM) 由于其构建可扩展视觉表示的潜在优势已经引起了广泛的关注。在传统方法中，模型通常专注于预测掩码补丁的特定内容，其性能与预定义的掩码策略高度相关。直觉上，这个过程可以被视为在训练一个学生（模型）通过解决给出的问题（预测掩码补丁）来进行训练。然而，我们认为模型不仅应该专注于解决给出的问题，还应该站在教师的角度自行生成更具挑战性的问题。为此，我们提出了一个全新的 MIM 预训练框架：难样本挖掘 (HPM)。我们观察到，在这种情况下，重建损失可以自然地成为预训练任务难度的度量。因此，我们引入了一个辅助 loss 预测器，首先预测补丁级别的损失，然后决定下一个应该掩蔽哪里。它采用了一种相对关系学习策略，以防止过度拟合到精确的重建损失值。在各种设置下的实验证明了 HPM 在构建掩模图像方面的有效性。此外，我们经验性地发现，仅引入 loss 预测目标就可以产生有效的表示，从而验证了意识到哪些地方难以重建的能力的有效性。

0

相关内容

Graph Transformer近期进展

Graph Transformer近期进展

专知会员服务

63+阅读 · 2023年1月5日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

【WWW2021】归一化硬样本挖掘的双重注意匹配网络

【WWW2021】归一化硬样本挖掘的双重注意匹配网络

专知会员服务

18+阅读 · 2021年3月31日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

专知会员服务

43+阅读 · 2020年4月1日

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知会员服务

32+阅读 · 2020年3月30日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

NeurIPS 22｜用于小样本语义分割的中间原型挖掘Transformer开源

NeurIPS 22｜用于小样本语义分割的中间原型挖掘Transformer开源

极市平台

0+阅读 · 2022年11月16日

自监督榜首！字节跳动提出视觉预训练模型dBOT，重新审视Masked Image Modeling

自监督榜首！字节跳动提出视觉预训练模型dBOT，重新审视Masked Image Modeling

PaperWeekly

0+阅读 · 2022年9月25日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

专知

20+阅读 · 2018年4月5日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

骨髓间充质干细胞生物反应器的构建及对脓毒症大鼠免疫失衡的双向调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

缺陷态石墨烯负载CdS复合材料结构与分解水制氢性能的第一性原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin和 Hedgehog信号通路互作在骨关节中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

随机泛函微分方程的适定性与渐近性分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于数据分布评估和支持向量机方法的分布式数据流挖掘模型和算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

二氧化钛/溴化银-银/导电聚合物纳米复合材料的制备及可见光催化机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于背景学习的并行粒子滤波红外弱小目标TBD算法研究

国家自然科学基金

1+阅读 · 2009年12月31日

DiffMatch: Diffusion Model for Dense Matching

Arxiv

0+阅读 · 2023年5月30日

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Arxiv

0+阅读 · 2023年5月28日

ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection

Arxiv

0+阅读 · 2023年5月26日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication

Arxiv

17+阅读 · 2021年6月2日

Boosting the Speed of Entity Alignment 10*: Dual Attention Matching Network with Normalized Hard Sample Mining

Arxiv

10+阅读 · 2021年3月29日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

相关VIP内容

Graph Transformer近期进展

Graph Transformer近期进展

专知会员服务

63+阅读 · 2023年1月5日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

【WWW2021】归一化硬样本挖掘的双重注意匹配网络

【WWW2021】归一化硬样本挖掘的双重注意匹配网络

专知会员服务

18+阅读 · 2021年3月31日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

专知会员服务

43+阅读 · 2020年4月1日

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知会员服务

32+阅读 · 2020年3月30日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

NeurIPS 22｜用于小样本语义分割的中间原型挖掘Transformer开源

NeurIPS 22｜用于小样本语义分割的中间原型挖掘Transformer开源

极市平台

0+阅读 · 2022年11月16日

自监督榜首！字节跳动提出视觉预训练模型dBOT，重新审视Masked Image Modeling

自监督榜首！字节跳动提出视觉预训练模型dBOT，重新审视Masked Image Modeling

PaperWeekly

0+阅读 · 2022年9月25日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

专知

20+阅读 · 2018年4月5日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

相关论文

DiffMatch: Diffusion Model for Dense Matching

Arxiv

0+阅读 · 2023年5月30日

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Arxiv

0+阅读 · 2023年5月28日

ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection

Arxiv

0+阅读 · 2023年5月26日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication

Arxiv

17+阅读 · 2021年6月2日

Boosting the Speed of Entity Alignment 10*: Dual Attention Matching Network with Normalized Hard Sample Mining

Arxiv

10+阅读 · 2021年3月29日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

骨髓间充质干细胞生物反应器的构建及对脓毒症大鼠免疫失衡的双向调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

缺陷态石墨烯负载CdS复合材料结构与分解水制氢性能的第一性原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin和 Hedgehog信号通路互作在骨关节中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

随机泛函微分方程的适定性与渐近性分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于数据分布评估和支持向量机方法的分布式数据流挖掘模型和算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

二氧化钛/溴化银-银/导电聚合物纳米复合材料的制备及可见光催化机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于背景学习的并行粒子滤波红外弱小目标TBD算法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员