补充材料:基于GAU的模型的实施和实验 (Supplementary Material: Implementation and Experiments for GAU-based Model) - 专知论文

会员服务 ·

0

MoDELS · 层 · 注意力机制 · CLUE · Performer ·

2022 年 5 月 18 日

Supplementary Material: Implementation and Experiments for GAU-based Model

翻译：补充材料:基于GAU的模型的实施和实验

In February this year Google proposed a new Transformer variant called FLASH, which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN. In this paper, some implementation details are re-analyzed both theoretically and practically. We then propose a novel GAU-based model and pre-train it on a Chinese corpus. Results of the CLUE benchmark show that our model achieves a dev average score of 75.02, 1% higher than RoFormerV1 and being 45% faster, which is also competitive with RoFormerV2.

翻译：今年2月,谷歌提出了名为FLASH的新的变异变体,其速度更快,VRAM足迹更低,性能更佳。这是通过设计一个名为GAU(Gated attention Unit)的表演层(GAU ) (GAU ) ( GAU) ( GAUI) (GAU) ( GAUT ) ( GAUI) ( GAUI) ( GAUT ) ( GUI) ( GUI) ( GUI) ( GUED TH) ( GUI) ) ( GUED ) ( GUI) ( GUED TH) ( GUED TH) ( GUI) ( GUED TUE) ( GUE) ( GUE) ( GUE) ( FF) ( FFFFN) ) ( FFFNF) ( FU) ( FOLASH) ( FOLASH) (F) (F) (FI) (FLASH) (FOU) (F) (FOUE) (FOU) (FLAU) (FOU) (FOU) (FOU) (F) (F) (FOU) (F) (F) (F) (F) (F) (F) (F) (F) (F) (F) (F) (F) (FOU) (FOU) (FOU) ) ) (FOU) (FO) (F) ) (FOU) (FO) (FOU) (FOU) (F) (F) (F) ) (F) ) ) ) (F) ) (FO) ) ) ) (FOU) (FOU) (FOU) (FO) (FO) (FO) ) ) ) (F) (FOUEU) (FOUEU) (FOU) (FOU) 。 ) ) ) (FOU) (FOUAU) ) )

1

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

MKP-4调节ERK信号通路在肝细胞癌发生发展中的意义

国家自然科学基金

0+阅读 · 2013年12月31日

低维冷原子系统的拓扑性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

计算电磁学积分方程的数值精度研究与改进

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

ERK信号转导通路在SLE表观遗传学基因表达调控机制中的作用探讨

国家自然科学基金

0+阅读 · 2012年12月31日

神经元特异性apoE4(1-272)致轴突运输障碍的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rictor调控内皮细胞功能及衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

节律调控分子Rev-erbα调节肥大细胞功能诱发斑块破裂的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark Detection

Arxiv

0+阅读 · 2022年7月8日

Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation

Arxiv

0+阅读 · 2022年7月6日

Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives

Arxiv

0+阅读 · 2022年7月6日

Architectural Optimization and Feature Learning for High-Dimensional Time Series Datasets

Arxiv

0+阅读 · 2022年7月6日

Query-Efficient Adversarial Attack Based on Latin Hypercube Sampling

Arxiv

0+阅读 · 2022年7月5日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型幻觉：系统综述

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

【博士论文】数据与任务的物理学：深度学习中的局部性与组合性理论

代理式人工智能时代的决策优势

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark Detection

Arxiv

0+阅读 · 2022年7月8日

Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation

Arxiv

0+阅读 · 2022年7月6日

Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives

Arxiv

0+阅读 · 2022年7月6日

Architectural Optimization and Feature Learning for High-Dimensional Time Series Datasets

Arxiv

0+阅读 · 2022年7月6日

Query-Efficient Adversarial Attack Based on Latin Hypercube Sampling

Arxiv

0+阅读 · 2022年7月5日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

MKP-4调节ERK信号通路在肝细胞癌发生发展中的意义

国家自然科学基金

0+阅读 · 2013年12月31日

低维冷原子系统的拓扑性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

计算电磁学积分方程的数值精度研究与改进

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

ERK信号转导通路在SLE表观遗传学基因表达调控机制中的作用探讨

国家自然科学基金

0+阅读 · 2012年12月31日

神经元特异性apoE4(1-272)致轴突运输障碍的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rictor调控内皮细胞功能及衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

节律调控分子Rev-erbα调节肥大细胞功能诱发斑块破裂的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员