通过构件-Wise梯级诺姆剪接,提高精放前预先培训语言模式的稳定性 (Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping) - 专知论文

会员服务 ·

0

语言模型化 · Performance · MoDELS · 层 · state-of-the-art ·

2022 年 10 月 19 日

Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping

翻译：通过构件-Wise梯级诺姆剪接,提高精放前预先培训语言模式的稳定性

Chenghao Yang,Xuezhe Ma

from arxiv, EMNLP 2022 Camera Ready

Fine-tuning over large pretrained language models (PLMs) has established many state-of-the-art results. Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks for practical applications. Previous works have attributed such instability to the catastrophic forgetting problem in the top layers of PLMs, which indicates iteratively that fine-tuning layers in a top-down manner is a promising solution. In this paper, we first point out that this method does not always work out due to the different convergence speeds of different layers/modules. Inspired by this observation, we propose a simple component-wise gradient norm clipping method to adjust the convergence speed for different components. Experiment results demonstrate that our method achieves consistent improvements in terms of generalization performance, convergence speed, and training stability. The codebase can be found at https://github.com/yangalan123/FineTuningStability.

翻译：对大型预先培训语言模型(PLM)的微调已经确立了许多最先进的结果。尽管这种微调表现优异,但可能不稳定,导致性能和潜在实际应用风险的显著差异。以前的工作将这种不稳定归因于PLM顶层的灾难性遗忘问题,这反复地表明,自上而下的微调层是一个有希望的解决办法。在本文中,我们首先指出,由于不同层/模量的趋同速度不同,这种方法并不总是能奏效。我们根据这项观察,提出了一种简单、分成件的梯度标准剪切法,以调整不同组件的趋同速度。实验结果表明,我们的方法在通用性、趋同速度和培训稳定性方面都取得了一致的改进。代码库可以在https://github.com/yangalan123/FineTuningStable上找到。

0

相关内容

语言模型化

语言模型化

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

CLIC1在动脉粥样硬化过程内皮细胞损伤与炎症中的作用及丹参酮ⅡA的干预

国家自然科学基金

0+阅读 · 2013年12月31日

冷刺激诱发的脂肪代谢改变影响动脉粥样硬化进程的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Yb离子和Ce离子共掺以增强GaN:Er微纳米晶发光性能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

小青龙汤通过PKCδ/ERK/PARP-1信号通路调节H1R表达治疗变应性鼻炎的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

无钴Ni-Mn固溶结构强化高比容量镍基正极材料基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

调节性树突状细胞在动脉粥样硬化中的功能及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Zero-Shot Models with Label Distribution Priors

Improving Zero-Shot Models with Label Distribution Priors

Arxiv

0+阅读 · 2022年12月1日

Finetune like you pretrain: Improved finetuning of zero-shot vision models

Arxiv

1+阅读 · 2022年12月1日

Generalizing and Improving Jacobian and Hessian Regularization

Arxiv

0+阅读 · 2022年12月1日

Bayesian order identification of ARMA models with projection predictive inference

Arxiv

0+阅读 · 2022年11月30日

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

Arxiv

0+阅读 · 2022年11月30日

Wavelet Diffusion Models are fast and scalable Image Generators

Arxiv

0+阅读 · 2022年11月29日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Overcoming Catastrophic Forgetting in Graph Neural Networks

Arxiv

14+阅读 · 2020年12月10日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

语言模型化

state-of-the-art

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Improving Zero-Shot Models with Label Distribution Priors

Improving Zero-Shot Models with Label Distribution Priors

Arxiv

0+阅读 · 2022年12月1日

Finetune like you pretrain: Improved finetuning of zero-shot vision models

Arxiv

1+阅读 · 2022年12月1日

Generalizing and Improving Jacobian and Hessian Regularization

Arxiv

0+阅读 · 2022年12月1日

Bayesian order identification of ARMA models with projection predictive inference

Arxiv

0+阅读 · 2022年11月30日

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

Arxiv

0+阅读 · 2022年11月30日

Wavelet Diffusion Models are fast and scalable Image Generators

Arxiv

0+阅读 · 2022年11月29日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Overcoming Catastrophic Forgetting in Graph Neural Networks

Arxiv

14+阅读 · 2020年12月10日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

CLIC1在动脉粥样硬化过程内皮细胞损伤与炎症中的作用及丹参酮ⅡA的干预

国家自然科学基金

0+阅读 · 2013年12月31日

冷刺激诱发的脂肪代谢改变影响动脉粥样硬化进程的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Yb离子和Ce离子共掺以增强GaN:Er微纳米晶发光性能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

小青龙汤通过PKCδ/ERK/PARP-1信号通路调节H1R表达治疗变应性鼻炎的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

无钴Ni-Mn固溶结构强化高比容量镍基正极材料基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

调节性树突状细胞在动脉粥样硬化中的功能及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员