改进BERT的微调的规范化 (Noise Stability Regularization for Improving BERT Fine-tuning) - 专知论文

会员服务 ·

0

正则化项 · 噪声 · BERT · Processing（编程语言） · 语言模型化 ·

2021 年 7 月 10 日

Noise Stability Regularization for Improving BERT Fine-tuning

翻译：改进BERT的微调的规范化

Hang Hua,Xingjian Li,Dejing Dou,Cheng-Zhong Xu,Jiebo Luo

from arxiv, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of training samples available. The brittleness of this process is often reflected by the sensitivity to random seeds. In this paper, we propose to tackle this problem based on the noise stability property of deep nets, which is investigated in recent literature (Arora et al., 2018; Sanyal et al., 2020). Specifically, we introduce a novel and effective regularization method to improve fine-tuning on NLP tasks, referred to as Layer-wise Noise Stability Regularization (LNSR). We extend the theories about adding noise to the input and prove that our method gives a stabler regularization effect. We provide supportive evidence by experimentally confirming that well-performing models show a low sensitivity to noise and fine-tuning with LNSR exhibits clearly higher generalizability and stability. Furthermore, our method also demonstrates advantages over other state-of-the-art algorithms including L2-SP (Li et al., 2018), Mixout (Lee et al., 2020) and SMART (Jiang et al., 2020).

翻译：在本文中,我们提议根据深网的噪音稳定性特性来解决这一问题,最近的文献对此进行了调查(Arora等人,2018年;Sanyal等人,2020年)。具体地说,我们采用了一种新颖和有效的正规化方法,以改进对NLP任务的微调,称为 " 层与层之间的噪音稳定化 " (LNSR)。我们扩展了在投入中增加噪音的理论,并证明我们的方法具有更稳定的规范化效果。我们通过实验性证实,运行良好的模型显示对噪音的敏感度低,并且与LNSR展览的微调明显提高了一般性和稳定性(Arora等人,2018年;Sanyal等人,2020年)。此外,我们的方法还展示了在L2-SP(Li等人,2018年)、Mix等人,2020年(Lix等人,2020年)和ALg等人(Lix等人,2020年)。

0

相关内容

正则化项

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

已删除

将门创投

4+阅读 · 2019年9月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

Arxiv

0+阅读 · 2021年9月13日

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

Arxiv

0+阅读 · 2021年9月10日

Transfer of Pretrained Model Weights Substantially Improves Semi-Supervised Image Classification

Arxiv

0+阅读 · 2021年9月9日

PPT: Pre-trained Prompt Tuning for Few-shot Learning

Arxiv

0+阅读 · 2021年9月9日

Inducing Relational Knowledge from BERT

Arxiv

3+阅读 · 2019年11月28日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

Processing（编程语言）

语言模型化

相关VIP内容

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

已删除

将门创投

4+阅读 · 2019年9月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

Arxiv

0+阅读 · 2021年9月13日

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

Arxiv

0+阅读 · 2021年9月10日

Transfer of Pretrained Model Weights Substantially Improves Semi-Supervised Image Classification

Arxiv

0+阅读 · 2021年9月9日

PPT: Pre-trained Prompt Tuning for Few-shot Learning

Arxiv

0+阅读 · 2021年9月9日

Inducing Relational Knowledge from BERT

Arxiv

3+阅读 · 2019年11月28日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员