Jigsaw-VIT:在愿景变换器中学习 Jigsaw 拼图 (Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer) - 专知论文

会员服务 ·

0

泛化理论 · Learning · Vision · 图片分类 · 变换 ·

2022 年 7 月 25 日

Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer

翻译：Jigsaw-VIT:在愿景变换器中学习 Jigsaw 拼图

Yingyi Chen,Xi Shen,Yahui Liu,Qinghua Tao,Johan A. K. Suykens

The success of Vision Transformer (ViT) in various computer vision tasks has promoted the ever-increasing prevalence of this convolution-free network. The fact that ViT works on image patches makes it potentially relevant to the problem of jigsaw puzzle solving, which is a classical self-supervised task aiming at reordering shuffled sequential image patches back to their natural form. Despite its simplicity, solving jigsaw puzzle has been demonstrated to be helpful for diverse tasks using Convolutional Neural Networks (CNNs), such as self-supervised feature representation learning, domain generalization, and fine-grained classification. In this paper, we explore solving jigsaw puzzle as a self-supervised auxiliary loss in ViT for image classification, named Jigsaw-ViT. We show two modifications that can make Jigsaw-ViT superior to standard ViT: discarding positional embeddings and masking patches randomly. Yet simple, we find that Jigsaw-ViT is able to improve both in generalization and robustness over the standard ViT, which is usually rather a trade-off. Experimentally, we show that adding the jigsaw puzzle branch provides better generalization than ViT on large-scale image classification on ImageNet. Moreover, the auxiliary task also improves robustness to noisy labels on Animal-10N, Food-101N, and Clothing1M as well as adversarial examples. Our implementation is available at https://yingyichen-cyy.github.io/Jigsaw-ViT/.

翻译：视觉变异器(Vigs Greanger)在各种计算机视觉任务中的成功促进了这种无革命性网络的日益普及。 ViT在图像补丁上工作,使得它有可能与拼图解谜题问题相关,而拼图解谜题是一个典型的自我监督任务,目的是重新排序被打乱的连续图像补丁,使其恢复到自然形式。尽管它简单,但解决拼图拼图难题已证明有助于使用 Convolual Neal网络(CNNs)来完成各种任务,例如自我监督的特征演示学习、域域域化和精细的分类。在本文中,我们探讨将拼图拼图拼图拼图作为维格解解解谜的一个自监督的辅助损失来解决。我们展示了两个修改,使 Jigsaw-ViT 高于标准格式: 丢弃定位嵌嵌入和随机掩蔽。然而, Jigsaw- Vialyalaling- Vialoff 能够改进标准ViT的通用和坚固度, 通常比交易/Sildal-LIal-Ial-Ialations 提供了一个更大规模的分类。

0

相关内容

泛化理论

【CVPR 2022】基于灵活模态Transformer的人脸防伪 FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

【CVPR 2022】基于灵活模态Transformer的人脸防伪 FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

专知会员服务

17+阅读 · 2022年3月19日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

界面超导硒化铁/钛酸锶异质结的电声相互作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

Wip1对胚胎干细胞自我更新、分化与致瘤性的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

硅基同质异质结太阳电池物理与器件研究

国家自然科学基金

0+阅读 · 2014年12月31日

补体C1q在Tau+/C1q-/- 小鼠模型中影响Tau蛋白磷酸化及其分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

圆窗激振式人工中耳听力补偿力学机制及振子参数优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

纤锌矿ZnO基量子阱中电子-声子相互作用和内建电场对电子态能级和结合能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

煤/聚乙烯亚胺交联复合螯合吸附剂制备及其对重金属离子的协同作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向产品几何规范的知识表示与测量认证研究

国家自然科学基金

0+阅读 · 2011年12月31日

PPARδ通过上调GLP-1受体抗胰岛β细胞脂毒性凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ECM-整合素-CSK系统在切应力促内皮祖细胞分化中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Spatial-then-Temporal Self-Supervised Learning for Video Correspondence

Arxiv

0+阅读 · 2022年9月16日

Enhance the Visual Representation via Discrete Adversarial Training

Arxiv

0+阅读 · 2022年9月16日

One-Shot Synthesis of Images and Segmentation Masks

Arxiv

0+阅读 · 2022年9月15日

A Light Recipe to Train Robust Vision Transformers

Arxiv

0+阅读 · 2022年9月15日

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年9月15日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于灵活模态Transformer的人脸防伪 FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

【CVPR 2022】基于灵活模态Transformer的人脸防伪 FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

专知会员服务

17+阅读 · 2022年3月19日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Spatial-then-Temporal Self-Supervised Learning for Video Correspondence

Arxiv

0+阅读 · 2022年9月16日

Enhance the Visual Representation via Discrete Adversarial Training

Arxiv

0+阅读 · 2022年9月16日

One-Shot Synthesis of Images and Segmentation Masks

Arxiv

0+阅读 · 2022年9月15日

A Light Recipe to Train Robust Vision Transformers

Arxiv

0+阅读 · 2022年9月15日

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年9月15日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

界面超导硒化铁/钛酸锶异质结的电声相互作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

Wip1对胚胎干细胞自我更新、分化与致瘤性的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

硅基同质异质结太阳电池物理与器件研究

国家自然科学基金

0+阅读 · 2014年12月31日

补体C1q在Tau+/C1q-/- 小鼠模型中影响Tau蛋白磷酸化及其分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

圆窗激振式人工中耳听力补偿力学机制及振子参数优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

纤锌矿ZnO基量子阱中电子-声子相互作用和内建电场对电子态能级和结合能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

煤/聚乙烯亚胺交联复合螯合吸附剂制备及其对重金属离子的协同作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向产品几何规范的知识表示与测量认证研究

国家自然科学基金

0+阅读 · 2011年12月31日

PPARδ通过上调GLP-1受体抗胰岛β细胞脂毒性凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ECM-整合素-CSK系统在切应力促内皮祖细胞分化中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员