抓住BERT的"铁道" (Catch the "Tails" of BERT) - 专知论文

会员服务 ·

0

Better · 向量化 · BERT · 词向量表示 · 词义消歧 ·

2020 年 11 月 30 日

Catch the "Tails" of BERT

翻译：抓住BERT的"铁道"

Recently, contextualized word embeddings outperform static word embeddings on many NLP tasks. However, we still do not know much about the mechanism inside these representations. Do they have any common patterns? If so, where do these patterns come from? We find that almost all the contextualized word vectors of BERT and RoBERTa have a common pattern. For BERT, the $557^{th}$ element is always the smallest. For RoBERTa, the $588^{th}$ element is always the largest, and the $77^{th}$ element is the smallest. We call them "tails" of models. We introduce a new neuron-level method to analyze where these "tails" come from. We find that these "tails" are closely related to the positional information. We also investigate what will happen if we "cutting the tails" (zero-out). Our results show that "tails" are the major cause of anisotropy of vector space. After "cutting the tails", a word's different vectors are more similar to each other. The internal representations have a better ability to distinguish a word's different senses with the word-in-context (WiC) dataset. The performance on the word sense disambiguation task is better for BERT and unchanged for RoBERTa. We can also better induce phrase grammar from the vector space. These suggest that "tails" are less related to the sense and syntax information in vectors. These findings provide insights into the inner workings of contextualized word vectors.

翻译：最近, 背景化的单词嵌入了比NLP 的许多任务中静态的单词嵌入比静态的单词更优。但是, 我们仍不十分了解这些表达式中的机制。是否它们有共同的模式? 如果是这样, 这些模式来自哪里? 我们发现几乎所有 BERT 和 RoBERTA 的背景化的单词矢量都有共同的模式。对于 BERT, 557 {th} $ 元素总是最小的。对 RobERTA 来说, $ 元素总是最大, 而 $ 元素是最小的。我们称之为模型中的“ 尾端 ” 。我们采用新的神经级方法来分析这些“ 尾端” 的来源。我们发现这些“ 尾端” 和 RobTA 的“ 尾部” 是最小的。内部表达式更有能力将“ 尾部” 的单词和“ 方向性能” 与“ 方向性能” 更好地区分 BC 。这些内部表达器能够将“ 感官” 和“ 感官” 的言词与“ 感化” 。我们的感官与“ 感化” 感官与“ 感官” 和“ 感化” 和“ 感化” 感化” 感化” 感官与“ 感官” 的感官与“ 感官能” 进行更好的感官与“ 。

0

相关内容

Better

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

站在BERT肩膀上的NLP新秀们：XLMs、MASS和UNILM

站在BERT肩膀上的NLP新秀们：XLMs、MASS和UNILM

PaperWeekly

16+阅读 · 2019年6月6日

站在BERT肩膀上的NLP新秀们（PART I）

站在BERT肩膀上的NLP新秀们（PART I）

AINLP

30+阅读 · 2019年6月4日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Exploring Lexical Irregularities in Hypothesis-OnlyModels of Natural Language Inference

Arxiv

0+阅读 · 2021年1月19日

Properties of using Fisher information gain for Bayesian design of experiments

Arxiv

0+阅读 · 2021年1月18日

Is the Chen-Sbert Divergence a Metric?

Arxiv

0+阅读 · 2021年1月15日

Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT

Arxiv

7+阅读 · 2019年10月28日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Arxiv

4+阅读 · 2019年9月11日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

VIP会员

文章信息

相关主题

词向量表示

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《“海马斯”高机动火箭炮系统：重塑为多任务平台》38页

《卫星星座任务规划新方法》

美国空军如何在电磁对抗环境中备战

【CMU博士论文】《生成式机器人：用于人机协同创作的自监督学习》

相关资讯

站在BERT肩膀上的NLP新秀们：XLMs、MASS和UNILM

站在BERT肩膀上的NLP新秀们：XLMs、MASS和UNILM

PaperWeekly

16+阅读 · 2019年6月6日

站在BERT肩膀上的NLP新秀们（PART I）

站在BERT肩膀上的NLP新秀们（PART I）

AINLP

30+阅读 · 2019年6月4日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Exploring Lexical Irregularities in Hypothesis-OnlyModels of Natural Language Inference

Arxiv

0+阅读 · 2021年1月19日

Properties of using Fisher information gain for Bayesian design of experiments

Arxiv

0+阅读 · 2021年1月18日

Is the Chen-Sbert Divergence a Metric?

Arxiv

0+阅读 · 2021年1月15日

Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT

Arxiv

7+阅读 · 2019年10月28日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Arxiv

4+阅读 · 2019年9月11日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

微信扫码咨询专知VIP会员