通过PrelayerNorm in Patch 嵌入式改进视觉变形器的威力 (Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding) - 专知论文

会员服务 ·

0

稳健性 · Vision · Performer · 卷积神经网络 · 变换 ·

2021 年 11 月 16 日

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

翻译：通过PrelayerNorm in Patch 嵌入式改进视觉变形器的威力

Bum Jun Kim,Hyeyeon Choi,Hyeonah Jang,Dong Gu Lee,Wonseok Jeong,Sang Woo Kim

from arxiv, 7 pages, 8 figures. Work in Progress

Vision transformers (ViTs) have recently demonstrated state-of-the-art performance in a variety of vision tasks, replacing convolutional neural networks (CNNs). Meanwhile, since ViT has a different architecture than CNN, it may behave differently. To investigate the reliability of ViT, this paper studies the behavior and robustness of ViT. We compared the robustness of CNN and ViT by assuming various image corruptions that may appear in practical vision tasks. We confirmed that for most image transformations, ViT showed robustness comparable to CNN or more improved. However, for contrast enhancement, severe performance degradations were consistently observed in ViT. From a detailed analysis, we identified a potential problem: positional embedding in ViT's patch embedding could work improperly when the color scale changes. Here we claim the use of PreLayerNorm, a modified patch embedding structure to ensure scale-invariant behavior of ViT. ViT with PreLayerNorm showed improved robustness in various corruptions including contrast-varying environments.

翻译：视觉变压器(ViTs)最近展示了各种视觉任务中最先进的性能,取代了进化神经网络(CNNs ) 。与此同时,由于ViT有与CNN不同的结构,它的行为可能有所不同。为了调查ViT的可靠性,本文研究了ViT的行为和稳健性。我们比较了CNN和ViT的强健性,假设在实际的视觉任务中可能出现各种图像腐败。我们确认,对于大多数图像转换而言,ViT表现出与CNN的强力或更大的强力。然而,为了提高对比,ViT持续观察到严重的性能退化。我们从详细分析中发现了一个潜在问题:在ViT的嵌入补时,在彩色比例变化时,定位嵌入的定位可能运作不当。我们在这里声称使用Pre LayerNorm,一个经过修改的嵌合结构,以确保ViT.ViT和PreLayerNorm在对比环境等各种腐败中的强性。

0

相关内容

稳健性

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

TerViT: An Efficient Ternary Vision Transformer

Arxiv

0+阅读 · 2022年1月20日

Q-ViT: Fully Differentiable Quantization for Vision Transformer

Q-ViT: Fully Differentiable Quantization for Vision Transformer

Arxiv

0+阅读 · 2022年1月19日

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

Arxiv

0+阅读 · 2022年1月19日

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Arxiv

0+阅读 · 2022年1月17日

The perfectly matched layer (PML) for hyperbolic wave propagation problems: A review

Arxiv

0+阅读 · 2022年1月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

VIP会员

文章信息

相关主题

卷积神经网络

相关VIP内容

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

TerViT: An Efficient Ternary Vision Transformer

Arxiv

0+阅读 · 2022年1月20日

Q-ViT: Fully Differentiable Quantization for Vision Transformer

Q-ViT: Fully Differentiable Quantization for Vision Transformer

Arxiv

0+阅读 · 2022年1月19日

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

Arxiv

0+阅读 · 2022年1月19日

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Arxiv

0+阅读 · 2022年1月17日

The perfectly matched layer (PML) for hyperbolic wave propagation problems: A review

Arxiv

0+阅读 · 2022年1月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

微信扫码咨询专知VIP会员