关于蒙面语言建模:从统计到协同依赖的感性偏见 (On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies) - 专知论文

会员服务 ·

0

归纳偏好 · 统计量 · 掩码 · 有偏 · 语言模型化 ·

2021 年 4 月 12 日

On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

翻译：关于蒙面语言建模:从统计到协同依赖的感性偏见

Tianyi Zhang,Tatsunori Hashimoto

from arxiv, NAACL-HLT 2021

We study how masking and predicting tokens in an unsupervised fashion can give rise to linguistic structures and downstream performance gains. Recent theories have suggested that pretrained language models acquire useful inductive biases through masks that implicitly act as cloze reductions for downstream tasks. While appealing, we show that the success of the random masking strategy used in practice cannot be explained by such cloze-like masks alone. We construct cloze-like masks using task-specific lexicons for three different classification datasets and show that the majority of pretrained performance gains come from generic masks that are not associated with the lexicon. To explain the empirical success of these generic masks, we demonstrate a correspondence between the Masked Language Model (MLM) objective and existing methods for learning statistical dependencies in graphical models. Using this, we derive a method for extracting these learned statistical dependencies in MLMs and show that these dependencies encode useful inductive biases in the form of syntactic structures. In an unsupervised parsing evaluation, simply forming a minimum spanning tree on the implied statistical dependence structure outperforms a classic method for unsupervised parsing (58.74 vs. 55.91 UUAS).

翻译：我们研究如何以不受监督的方式遮掩和预测象征物,从而产生语言结构和下游业绩收益。最近的一些理论表明,预先培训的语文模式通过隐含为下游任务凝聚减少凝块作用的面罩,获得有用的诱导偏见。我们颇有吸引力地表明,在实践中使用的随机遮掩战略的成功不能仅仅用这种凝胶式的面具来解释。我们用三种不同的分类数据集使用特定任务分类法来构建象凝胶的遮罩,并表明大部分预先培训的性能收益来自与词汇表无关的通用面罩。为了解释这些通用面罩的经验成功,我们展示了蒙蔽语言模式的目标和在图形模型中学习统计依赖性的现有方法之间的对应。我们利用这个方法,可以找出在MLMS中学到的这些统计依赖性,并表明这些依赖性编码在合成结构中有用诱导偏差。在未经监督的评价中,仅仅在隐含的统计依赖性结构上形成最低限度的树宽度。

0

相关内容

归纳偏好

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

专知会员服务

78+阅读 · 2020年8月13日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

30+阅读 · 2020年7月12日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

专知会员服务

43+阅读 · 2020年4月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

已删除

将门创投

3+阅读 · 2019年9月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Recurrent Neural Networks with Mixed Hierarchical Structures for Natural Language Processing

Recurrent Neural Networks with Mixed Hierarchical Structures for Natural Language Processing

Arxiv

0+阅读 · 2021年6月4日

Statistical Considerations for Cross-Sectional HIV Incidence Estimation Based on Recency Test

Arxiv

0+阅读 · 2021年6月3日

MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding

Arxiv

0+阅读 · 2021年6月3日

SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics

Arxiv

0+阅读 · 2021年6月2日

Examining the Inductive Bias of Neural Language Models with Artificial Languages

Arxiv

0+阅读 · 2021年6月2日

Have Attention Heads in BERT Learned Constituency Grammar?

Arxiv

0+阅读 · 2021年6月2日

Style is NOT a single variable: Case Studies for Cross-Style Language Understanding

Arxiv

0+阅读 · 2021年6月2日

Unsupervised Neural Text Simplification

Arxiv

3+阅读 · 2018年12月19日

Music Transformer

Music Transformer

Arxiv

5+阅读 · 2018年12月12日

Phrase-Based & Neural Unsupervised Machine Translation

Arxiv

4+阅读 · 2018年4月20日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

【神经自然语言处理进展：建模，学习，推理】Progress in Neural NLP: Modeling, Learning, and Reasoning

专知会员服务

78+阅读 · 2020年8月13日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

30+阅读 · 2020年7月12日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

【CVPR2020-哈工大-京东】自监督结构建模的目标识别，Self-supervised Structure Modeling

专知会员服务

43+阅读 · 2020年4月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《运用阵营部署粒子滤波器在部分可观测的陆基军事仿真中追踪敌方部队实体位置》2025最新127页

《基于博弈论学习与控制提升复杂自适应系统的韧性》358页

人工智能能否胜任“金穹”的三分钟窗口战争？

《时间受限环境下的规划：连与排级单位的快速规划方法》

相关资讯

已删除

将门创投

3+阅读 · 2019年9月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Recurrent Neural Networks with Mixed Hierarchical Structures for Natural Language Processing

Recurrent Neural Networks with Mixed Hierarchical Structures for Natural Language Processing

Arxiv

0+阅读 · 2021年6月4日

Statistical Considerations for Cross-Sectional HIV Incidence Estimation Based on Recency Test

Arxiv

0+阅读 · 2021年6月3日

MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding

Arxiv

0+阅读 · 2021年6月3日

SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics

Arxiv

0+阅读 · 2021年6月2日

Examining the Inductive Bias of Neural Language Models with Artificial Languages

Arxiv

0+阅读 · 2021年6月2日

Have Attention Heads in BERT Learned Constituency Grammar?

Arxiv

0+阅读 · 2021年6月2日

Style is NOT a single variable: Case Studies for Cross-Style Language Understanding

Arxiv

0+阅读 · 2021年6月2日

Unsupervised Neural Text Simplification

Arxiv

3+阅读 · 2018年12月19日

Music Transformer

Music Transformer

Arxiv

5+阅读 · 2018年12月12日

Phrase-Based & Neural Unsupervised Machine Translation

Arxiv

4+阅读 · 2018年4月20日

微信扫码咨询专知VIP会员