公路变异器:自建增强自控网络 (Highway Transformer: Self-Gating Enhanced Self-Attentive Networks) - 专知论文

会员服务 ·

0

INFORMS · 门控机制 · Networking · Transformer · state-of-the-art ·

2020 年 11 月 24 日

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

翻译：公路变异器:自建增强自控网络

Yekun Chai,Shuo Jin,Xinwen Hou

from arxiv, Accepted at ACL 2020

Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary content-based SDU gates allow for the information flow of modulated latent embeddings through skipped connections, leading to a clear margin of convergence speed with gradient descent algorithms. We may unveil the role of gating mechanism to aid in the context-based Transformer modules, with hypothesizing that SDU gates, especially on shallow layers, could push it faster to step towards suboptimal points during the optimization process.

翻译：自留机制在各种顺序学习任务方面取得了惊人的进展,通过关注不同地点的所有全球背景,站在多点产品上关注多点产品,关注不同地点的所有全球背景。我们通过假信息高速公路,引入了门式组件自依赖单位(SDU),配有LSTM制式的格子单元,以补充在个人代表的多维潜在空间内的内部语义重要性。基于内容的辅助工具门允许通过跳过连接调节的潜潜伏嵌入的信息流动,从而导致与梯度下沉算法之间的明显趋同速度差。我们可以公布在基于背景的变压器模块中帮助的格子机制的作用,并假设DUD门,特别是在浅层,可以在优化过程中将其推向次优化点。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

基于上下文化图注意力网络的知识图谱的条目推荐，Contextualized Graph Attention Network for Recommendation with Item Knowledge Graph

基于上下文化图注意力网络的知识图谱的条目推荐，Contextualized Graph Attention Network for Recommendation with Item Knowledge Graph

专知会员服务

101+阅读 · 2020年6月28日

【ICLR 2019】双曲注意力网络，Hyperbolic Attention Network

【ICLR 2019】双曲注意力网络，Hyperbolic Attention Network

专知会员服务

84+阅读 · 2020年6月21日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【CIKM 2019论文】基于Motif注意力的图卷积网络（Graph Convolutional Networks with Motif-based Attention），John Boaz Lee，Ryan Rossi，孔祥南

【CIKM 2019论文】基于Motif注意力的图卷积网络（Graph Convolutional Networks with Motif-based Attention），John Boaz Lee，Ryan Rossi，孔祥南

专知会员服务

53+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

机器学习研究会

50+阅读 · 2018年2月21日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

Relation-Aware Graph Attention Network for Visual Question Answering

Relation-Aware Graph Attention Network for Visual Question Answering

Arxiv

7+阅读 · 2019年10月9日

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Arxiv

10+阅读 · 2019年9月5日

Graph Convolutional Neural Networks via Motif-based Attention

Arxiv

10+阅读 · 2019年2月25日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

Arxiv

6+阅读 · 2018年9月6日

Knowledge Graph Embedding with Entity Neighbors and Deep Memory Network

Knowledge Graph Embedding with Entity Neighbors and Deep Memory Network

Arxiv

3+阅读 · 2018年8月11日

Bilinear Attention Networks

Arxiv

11+阅读 · 2018年5月21日

Reciprocal Attention Fusion for Visual Question Answering

Arxiv

5+阅读 · 2018年5月11日

Dual Recurrent Attention Units for Visual Question Answering

Arxiv

7+阅读 · 2018年2月1日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

基于上下文化图注意力网络的知识图谱的条目推荐，Contextualized Graph Attention Network for Recommendation with Item Knowledge Graph

基于上下文化图注意力网络的知识图谱的条目推荐，Contextualized Graph Attention Network for Recommendation with Item Knowledge Graph

专知会员服务

101+阅读 · 2020年6月28日

【ICLR 2019】双曲注意力网络，Hyperbolic Attention Network

【ICLR 2019】双曲注意力网络，Hyperbolic Attention Network

专知会员服务

84+阅读 · 2020年6月21日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【CIKM 2019论文】基于Motif注意力的图卷积网络（Graph Convolutional Networks with Motif-based Attention），John Boaz Lee，Ryan Rossi，孔祥南

【CIKM 2019论文】基于Motif注意力的图卷积网络（Graph Convolutional Networks with Motif-based Attention），John Boaz Lee，Ryan Rossi，孔祥南

专知会员服务

53+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

机器学习研究会

50+阅读 · 2018年2月21日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

Relation-Aware Graph Attention Network for Visual Question Answering

Relation-Aware Graph Attention Network for Visual Question Answering

Arxiv

7+阅读 · 2019年10月9日

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Arxiv

10+阅读 · 2019年9月5日

Graph Convolutional Neural Networks via Motif-based Attention

Arxiv

10+阅读 · 2019年2月25日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

Arxiv

6+阅读 · 2018年9月6日

Knowledge Graph Embedding with Entity Neighbors and Deep Memory Network

Knowledge Graph Embedding with Entity Neighbors and Deep Memory Network

Arxiv

3+阅读 · 2018年8月11日

Bilinear Attention Networks

Arxiv

11+阅读 · 2018年5月21日

Reciprocal Attention Fusion for Visual Question Answering

Arxiv

5+阅读 · 2018年5月11日

Dual Recurrent Attention Units for Visual Question Answering

Arxiv

7+阅读 · 2018年2月1日

微信扫码咨询专知VIP会员