对培训前模式的精细调整进行稳定分析 (A Stability Analysis of Fine-Tuning a Pre-Trained Model) - 专知论文

会员服务 ·

0

Analysis · MoDELS · tuning · 情景 · Extensibility ·

2023 年 1 月 24 日

A Stability Analysis of Fine-Tuning a Pre-Trained Model

翻译：对培训前模式的精细调整进行稳定分析

Zihao Fu,Anthony Man-Cho So,Nigel Collier

Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods to solve this problem, but there is no theoretical understanding of why and how these methods work. In this paper, we propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings, namely, full fine-tuning and head tuning. We define the stability under each setting and prove the corresponding stability bounds. The theoretical bounds explain why and how several existing methods can stabilize the fine-tuning procedure. In addition to being able to explain most of the observed empirical discoveries, our proposed theoretical analysis framework can also help in the design of effective and provable methods. Based on our theory, we propose three novel strategies to stabilize the fine-tuning procedure, namely, Maximal Margin Regularizer (MMR), Multi-Head Loss (MHLoss), and Self Unsupervised Re-Training (SURT). We extensively evaluate our proposed approaches on 11 widely used real-world benchmark datasets, as well as hundreds of synthetic classification datasets. The experiment results show that our proposed methods significantly stabilize the fine-tuning procedure and also corroborate our theoretical analysis.

翻译：然而,许多最近的工作表明,微调受不稳定问题的影响,即在同一背景下对同一模式进行调整,其业绩大不相同。许多最近的工作提出了解决这一问题的不同方法,但对这些方法的原理和如何运作没有理论上的理解。在本文件中,我们提议对微调进行新的理论稳定性分析,重点是两种常用环境,即全面微调和头部调整。我们界定了每个环境的稳定性,并证明相应的稳定性界限。理论界限解释了为什么以及现有若干方法如何稳定微调程序。除了能够解释大多数观察到的经验发现外,我们提议的理论分析框架也有助于设计有效和可验证的方法。根据我们的理论,我们还提出了三项新的战略,以稳定微调程序,即:马克西尔·马吉因(MMMR)、多功能性分析(ML-M)和多功能性能(ML),大规模地展示了我们所拟议的100度数据基准。(MLO-M),大规模地展示了我们提出的“SUR-Sil-Slimal-Regal-IL-IL-ILS-M-M-M-M-GLO-M-M-GLAD-M-M-MSUL-M-M-M-M-M-M-M-M-M-M-MSULAGR-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-IGR-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-I-I-I-I-I-I-I-I-I-I-I-I-I-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-I-IL-I-I-I-I-IL-IL-IL-I-I-I-I-I-I-IL-IL-I-I-

0

相关内容

Analysis

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PP2Cδ调控的线粒体ROS通路在肺损伤和炎症中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

冰雪灾害条件下南方地区岩锚失效的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

amiRNA干扰NMHC II-A对PRRSV感染细胞凋亡信号传导的影响及机制

国家自然科学基金

0+阅读 · 2012年12月31日

鸡传染性法氏囊病毒VP4和VP5蛋白抑制机体I型干扰素信号通路的探索

国家自然科学基金

0+阅读 · 2012年12月31日

DKK-1在间充质干细胞诱导免疫耐受机制中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

乙肝病毒感染与致病的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

精子介导的HIV-1基因在胚胎细胞中表达调控机制的探讨—#30456;关miRNA的筛选、鉴定与功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

MicroRNA在HBV感染中作用机理的研究

国家自然科学基金

0+阅读 · 2008年12月31日

On Stability and Generalization of Bilevel Optimization Problem

Arxiv

0+阅读 · 2023年3月15日

Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Arxiv

0+阅读 · 2023年3月15日

Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

Arxiv

0+阅读 · 2023年3月15日

Masked Vision and Language Modeling for Multi-modal Representation Learning

Arxiv

0+阅读 · 2023年3月14日

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Arxiv

0+阅读 · 2023年3月14日

Predicted Embedding Power Regression for Large-Scale Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年3月14日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

On Stability and Generalization of Bilevel Optimization Problem

Arxiv

0+阅读 · 2023年3月15日

Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Arxiv

0+阅读 · 2023年3月15日

Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

Arxiv

0+阅读 · 2023年3月15日

Masked Vision and Language Modeling for Multi-modal Representation Learning

Arxiv

0+阅读 · 2023年3月14日

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Arxiv

0+阅读 · 2023年3月14日

Predicted Embedding Power Regression for Large-Scale Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年3月14日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PP2Cδ调控的线粒体ROS通路在肺损伤和炎症中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

冰雪灾害条件下南方地区岩锚失效的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

amiRNA干扰NMHC II-A对PRRSV感染细胞凋亡信号传导的影响及机制

国家自然科学基金

0+阅读 · 2012年12月31日

鸡传染性法氏囊病毒VP4和VP5蛋白抑制机体I型干扰素信号通路的探索

国家自然科学基金

0+阅读 · 2012年12月31日

DKK-1在间充质干细胞诱导免疫耐受机制中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

乙肝病毒感染与致病的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

精子介导的HIV-1基因在胚胎细胞中表达调控机制的探讨—#30456;关miRNA的筛选、鉴定与功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

MicroRNA在HBV感染中作用机理的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员