精精制导变形器显示各层类似代表的群集 (Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers) - 专知论文

会员服务 ·

0

相似度 · 层 · 簇 · NLU · Performance ·

2021 年 9 月 20 日

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

翻译：精精制导变形器显示各层类似代表的群集

Jason Phang,Haokun Liu,Samuel R. Bowman

from arxiv, BlackboxNLP 2021

Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning. In this work, we use centered kernel alignment (CKA), a method for comparing learned representations, to measure the similarity of representations in task-tuned models across layers. In experiments across twelve NLU tasks, we discover a consistent block diagonal structure in the similarity of representations within fine-tuned RoBERTa and ALBERT models, with strong similarity within clusters of earlier and later layers, but not between them. The similarity of later layer representations implies that later layers only marginally contribute to task performance, and we verify in experiments that the top few layers of fine-tuned Transformers can be discarded without hurting performance, even with no further tuning.

翻译：尽管为下游自然语言理解(NLU)任务对诸如BERT等经过预先训练的语言编码员进行了微调,取得了成功,但人们仍然不太了解神经网络在微调后是如何变化的。在这项工作中,我们使用中央内核对齐(CKA)这个比较有学识的表述方法,以测量任务调整模型各层次的相似性。在12个新LU任务的实验中,我们发现在经过微调的RoBERTA和ALBERT模型中,在相似的表述结构中有一个一致的区块对角结构,在前层和后层的组合中非常相似,但在后层的表达方式之间则不同。后层的相似性意味着后层只能对任务绩效略作贡献,我们在实验中核实,即使没有进一步的调整,少数经过微调整的顶层变形器也可以在不伤害性能的情况下被丢弃。

0

相关内容

相似度

【ICML2021】压缩最大似然

专知会员服务

22+阅读 · 2021年9月23日

【ICML2021】自提升策略规划真实且可执行的分子逆合成路线

专知会员服务

8+阅读 · 2021年7月29日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

如何构建多模态BERT? 这份UNC76页《LXMERT: 从Transformer学习跨模态编码表示》PPT告诉您，附论文代码

如何构建多模态BERT? 这份UNC76页《LXMERT: 从Transformer学习跨模态编码表示》PPT告诉您，附论文代码

专知会员服务

85+阅读 · 2020年2月27日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

45+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

已删除

将门创投

3+阅读 · 2019年1月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Similarity and Matching of Neural Network Representations

Arxiv

10+阅读 · 2021年10月27日

Exploring Simple Siamese Representation Learning

Arxiv

4+阅读 · 2020年11月20日

Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Arxiv

5+阅读 · 2020年9月15日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Arxiv

4+阅读 · 2019年12月3日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Ermes: Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification

Arxiv

6+阅读 · 2018年6月7日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】压缩最大似然

专知会员服务

22+阅读 · 2021年9月23日

【ICML2021】自提升策略规划真实且可执行的分子逆合成路线

专知会员服务

8+阅读 · 2021年7月29日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

如何构建多模态BERT? 这份UNC76页《LXMERT: 从Transformer学习跨模态编码表示》PPT告诉您，附论文代码

如何构建多模态BERT? 这份UNC76页《LXMERT: 从Transformer学习跨模态编码表示》PPT告诉您，附论文代码

专知会员服务

85+阅读 · 2020年2月27日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

45+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

126页ppt《AI应用（AI Agent）开发新范式》！

基于深度神经网络的视频分析中的效率优化技术综述：处理系统、算法与应用

WWW2025 | KAG：一种大模型知识增强生成框架

用于时间序列预测的扩散模型：综述

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

已删除

将门创投

3+阅读 · 2019年1月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Similarity and Matching of Neural Network Representations

Arxiv

10+阅读 · 2021年10月27日

Exploring Simple Siamese Representation Learning

Arxiv

4+阅读 · 2020年11月20日

Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Arxiv

5+阅读 · 2020年9月15日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Arxiv

4+阅读 · 2019年12月3日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Ermes: Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification

Arxiv

6+阅读 · 2018年6月7日

微信扫码咨询专知VIP会员