以自我监督的注意改进BERT (Improving BERT with Self-Supervised Attention) - 专知论文

会员服务 ·

0

BERT · Performer · 注意力机制 · MoDELS · 泛化理论 ·

2021 年 10 月 22 日

Improving BERT with Self-Supervised Attention

翻译：以自我监督的注意改进BERT

Yiren Chen,Xiaoyu Kou,Jiangang Bai,Yunhai Tong

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these finetuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.

翻译：在应用诸如BERT等经过预先训练的大型国家清单模型方面,最受欢迎的范例之一是微调该模型在较小的数据集中的位置,然而,一个挑战仍然存在,因为微调模型往往在较小的数据集中过度使用。这一现象的一个症状是,该句中不相干或误导的词句对于人来说很容易理解,可以大大降低这些经过微调的国家清单模型的性能。在本文中,我们提议了一种新颖技术,称为“自我监督关注”以帮助推动这一普遍化挑战。具体地说,通过对以前的版本的微调模型进行研究,特别服务协定自动产生微弱的、象征性的注意标签。我们调查了将特别服务协定纳入生物清单的两种不同方法,并提出一种混合方法,以综合其益处。我们通过各种公共数据集,用我们的特别服务协定强化的BERT模型来说明显著的业绩改进。

0

相关内容

BERT

BERT全称Bidirectional Encoder Representations from Transformers，是预训练语言表示的方法，可以在大型文本语料库（如维基百科）上训练通用的“语言理解”模型，然后将该模型用于下游NLP任务，比如机器翻译、问答。

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

157+阅读 · 2019年11月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

Arxiv

10+阅读 · 2021年2月22日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Improving Question Answering by Commonsense-Based Pre-Training

Arxiv

4+阅读 · 2019年3月1日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

157+阅读 · 2019年11月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

未来战场：AI赋能无人作战新范式，39页ppt

【牛津博士论文】无限维空间中的广义变分推断

DeepSeek AI 从入门到付费专家·第一卷：动手实践、真实应用与可扩展 AI 解决方案全掌握

2025中国AI Agent商业应用场景洞察研究

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

Arxiv

10+阅读 · 2021年2月22日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Improving Question Answering by Commonsense-Based Pre-Training

Arxiv

4+阅读 · 2019年3月1日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

微信扫码咨询专知VIP会员