语言依赖和统计依赖 (Linguistic Dependencies and Statistical Dependence) - 专知论文

会员服务 ·

0

统计量 · 估计/估计量 · Extensibility · MoDELS · 语言模型化 ·

2021 年 9 月 10 日

Linguistic Dependencies and Statistical Dependence

翻译：语言依赖和统计依赖

Jacob Louis Hoover,Alessandro Sordoni,Wenyu Du,Timothy J. O'Donnell

from arxiv, EMNLP2021 camera-ready version. 9 pages, plus references and appendices

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most $\approx 0.5$. While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally comparable to a simple baseline formed by connecting adjacent words. We analyze which kinds of linguistic dependencies are best captured in CPMI dependencies, and also find marked differences between the estimates of the large pretrained language models, illustrating how their different training schemes affect the type of dependencies they capture.

翻译：在这项工作中,我们广泛分析了语言依赖性和语言之间的统计依赖性之间的关系。改进了以前的工作,我们采用了大型预先培训的语言模型来计算对语言之间点信息(CPMI)的背景估计值。对于多种模式和语言,我们提取依赖性树,以尽量扩大CPMI, 并与金质标准语言依赖性作比较。总体而言,我们发现CPMI依赖性获得无标签的无指导性附着得分,最高为$\approx 0.5美元。虽然远高于机会,而且始终高于非通俗的PMI基线,但这一得分通常与连接相邻语言的简单基线相近。我们分析哪些语言依赖性在CPMI依赖性中最能捕捉到,并发现大型预先培训语言模型的估计数之间存在明显差异,同时说明它们不同的培训计划如何影响它们所捕捉的相互依存性类型。

0

相关内容

统计量

不可错过! CMU CMU《高级自然语言处理》结课了，附课件与视频

不可错过! CMU CMU《高级自然语言处理》结课了，附课件与视频

专知会员服务

73+阅读 · 2021年10月4日

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

专知会员服务

51+阅读 · 2020年11月19日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

【ICLR2020】利用图神经网络进行高效概率逻辑推理，Efficient Probabilistic Logic Reasoning with Graph Neural Networks

【ICLR2020】利用图神经网络进行高效概率逻辑推理，Efficient Probabilistic Logic Reasoning with Graph Neural Networks

专知会员服务

113+阅读 · 2020年1月29日

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

专知会员服务

40+阅读 · 2019年12月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

8+阅读 · 2019年1月4日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Extended probabilities in Statistics

Arxiv

0+阅读 · 2021年11月1日

Holistic Deep Learning

Holistic Deep Learning

Arxiv

0+阅读 · 2021年10月29日

Statistical Review of Animal trials -- A Guideline

Arxiv

0+阅读 · 2021年10月29日

A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning

Arxiv

7+阅读 · 2021年8月14日

A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization

Arxiv

17+阅读 · 2021年6月18日

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Arxiv

3+阅读 · 2021年3月4日

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Arxiv

5+阅读 · 2019年9月17日

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

Arxiv

3+阅读 · 2019年8月23日

Deep learning evaluation using deep linguistic processing

Arxiv

3+阅读 · 2018年5月12日

DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks

Arxiv

9+阅读 · 2018年3月1日

VIP会员

文章信息

相关主题

估计/估计量

语言模型化

相关VIP内容

不可错过! CMU CMU《高级自然语言处理》结课了，附课件与视频

不可错过! CMU CMU《高级自然语言处理》结课了，附课件与视频

专知会员服务

73+阅读 · 2021年10月4日

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

【EMNLP2020】自然语言处理模型可解释性预测，182页ppt

专知会员服务

51+阅读 · 2020年11月19日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

【ICLR2020】利用图神经网络进行高效概率逻辑推理，Efficient Probabilistic Logic Reasoning with Graph Neural Networks

【ICLR2020】利用图神经网络进行高效概率逻辑推理，Efficient Probabilistic Logic Reasoning with Graph Neural Networks

专知会员服务

113+阅读 · 2020年1月29日

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

腾讯信息流内容理解技术实践，A User-Centered Concept Mining System for Query and Document Understanding at Tencent

专知会员服务

40+阅读 · 2019年12月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

8+阅读 · 2019年1月4日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Extended probabilities in Statistics

Arxiv

0+阅读 · 2021年11月1日

Holistic Deep Learning

Holistic Deep Learning

Arxiv

0+阅读 · 2021年10月29日

Statistical Review of Animal trials -- A Guideline

Arxiv

0+阅读 · 2021年10月29日

A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning

Arxiv

7+阅读 · 2021年8月14日

A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization

Arxiv

17+阅读 · 2021年6月18日

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Arxiv

3+阅读 · 2021年3月4日

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Arxiv

5+阅读 · 2019年9月17日

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

Arxiv

3+阅读 · 2019年8月23日

Deep learning evaluation using deep linguistic processing

Arxiv

3+阅读 · 2018年5月12日

DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks

Arxiv

9+阅读 · 2018年3月1日

微信扫码咨询专知VIP会员