嵌入嵌入位置需要独立的图层正常化 (Position Embedding Needs an Independent Layer Normalization) - 专知论文

会员服务 ·

0

VTS · 层 · 相互独立的 · 层规范化 · 位置嵌入 ·

2022 年 12 月 22 日

Position Embedding Needs an Independent Layer Normalization

翻译：嵌入嵌入位置需要独立的图层正常化

Runyi Yu,Zhennan Wang,Yinhuai Wang,Kehan Li,Yian Zhao,Jian Zhang,Guoli Song,Jie Chen

from arxiv, 14 pages, 8 figures

The Position Embedding (PE) is critical for Vision Transformers (VTs) due to the permutation-invariance of self-attention operation. By analyzing the input and output of each encoder layer in VTs using reparameterization and visualization, we find that the default PE joining method (simply adding the PE and patch embedding together) operates the same affine transformation to token embedding and PE, which limits the expressiveness of PE and hence constrains the performance of VTs. To overcome this limitation, we propose a simple, effective, and robust method. Specifically, we provide two independent layer normalizations for token embeddings and PE for each layer, and add them together as the input of each layer's Muti-Head Self-Attention module. Since the method allows the model to adaptively adjust the information of PE for different layers, we name it as Layer-adaptive Position Embedding, abbreviated as LaPE. Extensive experiments demonstrate that LaPE can improve various VTs with different types of PE and make VTs robust to PE types. For example, LaPE improves 0.94% accuracy for ViT-Lite on Cifar10, 0.98% for CCT on Cifar100, and 1.72% for DeiT on ImageNet-1K, which is remarkable considering the negligible extra parameters, memory and computational cost brought by LaPE. The code is publicly available at https://github.com/Ingrid725/LaPE.

翻译：位置嵌入器( PE) 对视野变异器( VT) 至关重要, 原因是自我注意操作的变异性差变。分析 VT 中每个编码层的输入和输出时, 我们发现默认的 PE 组合法( 简单添加 PE 和补丁嵌入) 运行相同的折线转换为代号嵌入和 PE, 这会限制 PE 的表达性, 从而限制 VT 的性能。为了克服这一限制, 我们提出了简单、有效和稳健的方法。具体地说, 我们通过对 VT 的代号嵌入和 PE 进行两个独立的层化分解。我们发现默认的 PE 组合法( 简单添加 PE) 和 PE, 并把它们合并为每个层的 Muti - 首席自我注意模块的输入。由于该方法允许该模型对 PE 的信息进行适应性调整, 我们将其命名为图层- 适应性定位定位定位, 缩成 LaPE 。广泛的实验显示, 可以用不同类别的 VT, PE 和 MIE 格式的格式的精确性格式的格式是 CILE 和的。的级级级的的级级的的级级的级级级级级的级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级

0

相关内容

VTS

VTS：VLSI Test Symposium Explanation：超大规模集成电路测试研讨会。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/vts/

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Lnc-TRMT2A竞争性结合miR-520a调控炎性通路在精神分裂症发病中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

寡层过渡金属二硫族化合物电子结构的角分辨光电子能谱研究

国家自然科学基金

0+阅读 · 2015年12月31日

NOD2介导的自噬在糖尿病肾病肾小管损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

缺氧诱导的microRNA调控TIP30表达促进胰腺癌转移作用机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

胸腺细胞分化抗原-1DNA甲基化介导铍化合物致肺部纤维化的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

表面活性蛋白-A在脓毒症诱导的肾小管上皮细胞损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低热导率三元稀土硫族化合物的热电特性调制

国家自然科学基金

0+阅读 · 2012年12月31日

妊娠小鼠骨组织TRPV5和TRPV6的表达规律及其在胎盘钙转运中的作用探讨

国家自然科学基金

0+阅读 · 2012年12月31日

PARP-1/AIF信号通路在重离子诱导神经细胞凋亡中的调控作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Arxiv

0+阅读 · 2023年2月20日

Warped Gradient-Enhanced Gaussian Process Surrogate Models for Exponential Family Likelihoods with Intractable Normalizing Constants

Arxiv

0+阅读 · 2023年2月20日

Building Shortcuts between Distant Nodes with Biaffine Mapping for Graph Convolutional Networks

Arxiv

0+阅读 · 2023年2月17日

Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces

Arxiv

18+阅读 · 2022年11月7日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Arxiv

20+阅读 · 2021年12月27日

Knowledge Embedding Based Graph Convolutional Network

Knowledge Embedding Based Graph Convolutional Network

Arxiv

24+阅读 · 2021年4月23日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

17+阅读 · 2018年12月10日

Convolutional 2D Knowledge Graph Embeddings

Arxiv

29+阅读 · 2018年4月6日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Arxiv

0+阅读 · 2023年2月20日

Warped Gradient-Enhanced Gaussian Process Surrogate Models for Exponential Family Likelihoods with Intractable Normalizing Constants

Arxiv

0+阅读 · 2023年2月20日

Building Shortcuts between Distant Nodes with Biaffine Mapping for Graph Convolutional Networks

Arxiv

0+阅读 · 2023年2月17日

Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces

Arxiv

18+阅读 · 2022年11月7日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Arxiv

20+阅读 · 2021年12月27日

Knowledge Embedding Based Graph Convolutional Network

Knowledge Embedding Based Graph Convolutional Network

Arxiv

24+阅读 · 2021年4月23日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

17+阅读 · 2018年12月10日

Convolutional 2D Knowledge Graph Embeddings

Arxiv

29+阅读 · 2018年4月6日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

相关基金

Lnc-TRMT2A竞争性结合miR-520a调控炎性通路在精神分裂症发病中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

寡层过渡金属二硫族化合物电子结构的角分辨光电子能谱研究

国家自然科学基金

0+阅读 · 2015年12月31日

NOD2介导的自噬在糖尿病肾病肾小管损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

缺氧诱导的microRNA调控TIP30表达促进胰腺癌转移作用机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

胸腺细胞分化抗原-1DNA甲基化介导铍化合物致肺部纤维化的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

表面活性蛋白-A在脓毒症诱导的肾小管上皮细胞损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低热导率三元稀土硫族化合物的热电特性调制

国家自然科学基金

0+阅读 · 2012年12月31日

妊娠小鼠骨组织TRPV5和TRPV6的表达规律及其在胎盘钙转运中的作用探讨

国家自然科学基金

0+阅读 · 2012年12月31日

PARP-1/AIF信号通路在重离子诱导神经细胞凋亡中的调控作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员