FL- Tring: 变换器中Fef- Forward 网络的图层图示 (FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer) - 专知论文

会员服务 ·

0

tuning · 层 · Prompt · Networking · Extensibility ·

2022 年 6 月 30 日

FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer

翻译：FL- Tring: 变换器中Fef- Forward 网络的图层图示

Jingping Liu,Yuqiu Song,Kui Xue,Hongli Sun,Chao Wang,Lihan Chen,Haiyun Jiang,Jiaqing Liang,Tong Ruan

Prompt tuning is an emerging way of adapting pre-trained language models to downstream tasks. However, the existing studies are mainly to add prompts to the input sequence. This way would not work as expected due to the intermediate multi-head self-attention and feed-forward network computation, making model optimization not very smooth. Hence, we propose a novel tuning way called layer tuning, aiming to add learnable parameters in Transformer layers. Specifically, we focus on layer tuning for feed-forward network in the Transformer, namely FL-tuning. It introduces additional units into the hidden layer of each feed-forward network. We conduct extensive experiments on the public CLUE benchmark. The results show that: 1) Our FL-tuning outperforms prompt tuning methods under both full-data and few-shot settings in almost all cases. In particular, it improves accuracy by 17.93% (full-data setting) on WSC 1.0 and F1 by 16.142% (few-shot setting) on CLUENER over P-tuning v2. 2) Our FL-tuning is more stable and converges about 1.17 times faster than P-tuning v2. 3) With only about 3% of Transformer's parameters to be trained, FL-tuning is comparable with fine-tuning on most datasets, and significantly outperforms fine-tuning (e.g., accuracy improved by 12.9% on WSC 1.1) on several datasets. The source codes are available at https://github.com/genggui001/FL-Tuning.

翻译：快速调试是使训练前语言模型适应下游任务的一种新兴方式。但是, 现有的研究主要是在输入序列中添加提示。由于中间多头自我注意和反馈前网络计算, 使得模型优化不十分顺利, 使得快速调试是一个叫分层调试的新调试方式, 目的是在变异器层中增加可学习的参数。具体地说, 我们侧重于变异器( FLL- 调试) 中反馈前向网络的层调试, 即 FL- 调试。它向每个进料前网络的隐藏层引入额外的单位。我们对公共 CLUE 基准进行了广泛的实验。结果显示:(1) 我们的FL调试比全数据和几发式网络的快速调试制方法都要顺利。特别是, 它提高了WSC 1.0 和 F1 的精准度17.93% (全数据设置) 16. 14. 14. 2, 我们的FL- 调试调( 调) 的精细度和趋近1. 17倍于P-L 调的精度, 数据比F- 调的精度要快。

0

相关内容

tuning

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

中国淡水桥弯藻（Cymbelloid）植物分类学研究

国家自然科学基金

1+阅读 · 2014年12月31日

重要资源植物羊草LcDREB2选择性剪接机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

REVOLUTA基因在豆科模式植物蒺藜苜蓿复叶发育中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

SNF1/AMPK/SnRK1复合体的亚基UPS调控拟南芥花粉与柱头互作的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRP-ERK通路在低温解除中国卤虫休眠胚胎滞育过程中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

嵌段共聚物自组装形成水溶性抗菌纳米球、棒以及囊泡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

StLRPK1类受体激酶介导的PTI免疫应答在马铃薯晚疫病抗性中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

液态和非晶态铝-半导体合金微观结构的多层次、多尺度分子动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

氧化锌半导体自旋注入和自旋过滤的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Domain Adversarial Spatial-Temporal Network: A Transferable Framework for Short-term Traffic Forecasting across Cities

Arxiv

0+阅读 · 2022年8月19日

Challenges and opportunities in applying Neural Temporal Point Processes to large scale industry data

Arxiv

0+阅读 · 2022年8月18日

Continual Learning in Deep Networks: an Analysis of the Last Layer

Arxiv

0+阅读 · 2022年8月17日

Are Transformers Effective for Time Series Forecasting?

Arxiv

0+阅读 · 2022年8月17日

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

Arxiv

0+阅读 · 2022年8月17日

Hierarchical Motion Planning Framework for Cooperative Transportation of Multiple Mobile Manipulators

Arxiv

0+阅读 · 2022年8月17日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Domain Adversarial Spatial-Temporal Network: A Transferable Framework for Short-term Traffic Forecasting across Cities

Arxiv

0+阅读 · 2022年8月19日

Challenges and opportunities in applying Neural Temporal Point Processes to large scale industry data

Arxiv

0+阅读 · 2022年8月18日

Continual Learning in Deep Networks: an Analysis of the Last Layer

Arxiv

0+阅读 · 2022年8月17日

Are Transformers Effective for Time Series Forecasting?

Arxiv

0+阅读 · 2022年8月17日

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

Arxiv

0+阅读 · 2022年8月17日

Hierarchical Motion Planning Framework for Cooperative Transportation of Multiple Mobile Manipulators

Arxiv

0+阅读 · 2022年8月17日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

相关基金

中国淡水桥弯藻（Cymbelloid）植物分类学研究

国家自然科学基金

1+阅读 · 2014年12月31日

重要资源植物羊草LcDREB2选择性剪接机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

REVOLUTA基因在豆科模式植物蒺藜苜蓿复叶发育中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

SNF1/AMPK/SnRK1复合体的亚基UPS调控拟南芥花粉与柱头互作的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRP-ERK通路在低温解除中国卤虫休眠胚胎滞育过程中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

嵌段共聚物自组装形成水溶性抗菌纳米球、棒以及囊泡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

StLRPK1类受体激酶介导的PTI免疫应答在马铃薯晚疫病抗性中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

液态和非晶态铝-半导体合金微观结构的多层次、多尺度分子动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

氧化锌半导体自旋注入和自旋过滤的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员