(非)字典如何是最佳的呢? (How (Non-)Optimal is the Lexicon?) - 专知论文

会员服务 ·

0

优化器 · 分解的 · 均值 · 约束 · 塑造 ·

2021 年 4 月 29 日

How (Non-)Optimal is the Lexicon?

翻译：(非)字典如何是最佳的呢?

Tiago Pimentel,Irene Nikkarinen,Kyle Mahowald,Ryan Cotterell,Damián Blasi

from arxiv, Tiago Pimentel and Irene Nikkarinen contributed equally to this work. Accepted at NAACL 2021

The mapping of lexical meanings to wordforms is a major feature of natural languages. While usage pressures might assign short words to frequent meanings (Zipf's law of abbreviation), the need for a productive and open-ended vocabulary, local constraints on sequences of symbols, and various other factors all shape the lexicons of the world's languages. Despite their importance in shaping lexical structure, the relative contributions of these factors have not been fully quantified. Taking a coding-theoretic view of the lexicon and making use of a novel generative statistical model, we define upper bounds for the compressibility of the lexicon under various constraints. Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes. We find that (compositional) morphology and graphotactics can sufficiently account for most of the complexity of natural codes -- as measured by code length.

翻译：文字形状的词汇含义绘图是自然语言的一个主要特征。虽然使用压力可能将短词划为常用词(Zipf的缩写法),但需要有一个有成果的、开放的词汇、对符号序列的当地限制以及各种其他因素,这些因素都决定了世界语言的词汇。尽管这些因素在形成词汇结构方面很重要,但这些因素的相对贡献尚未充分量化。用对词汇的编码理论观点和使用新的基因化统计模型,我们界定了在各种限制下可压缩词汇的上界。我们从7种类型多样的语言中研究公司,我们利用这些上界来量化词汇的最佳性,并探讨自然代码主要制约的相对成本。我们发现(组合)形态学和图形学可以充分说明自然代码的多数复杂性,按代码长度来衡量。

0

相关内容

优化器

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

5G边缘计算的价值机遇

5G边缘计算的价值机遇

专知会员服务

67+阅读 · 2020年8月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

The fundamental thermodynamic bounds on finite models

Arxiv

0+阅读 · 2021年6月18日

Error bounds for Lanczos-based matrix function approximation

Arxiv

0+阅读 · 2021年6月17日

Generalized regression operator estimation for continuous time functional data processes with missing at random response

Arxiv

0+阅读 · 2021年6月17日

Model-assisted estimation through random forests in finite population sampling

Arxiv

0+阅读 · 2021年6月17日

Exponential Approximation of Band-limited Signals from Nonuniform Sampling

Arxiv

0+阅读 · 2021年6月16日

Author Clustering and Topic Estimation for Short Texts

Arxiv

0+阅读 · 2021年6月15日

Multi-sample estimation of centered log-ratio matrix in microbiome studies

Arxiv

0+阅读 · 2021年6月15日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Signal Processing and Piecewise Convex Estimation

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

5G边缘计算的价值机遇

5G边缘计算的价值机遇

专知会员服务

67+阅读 · 2020年8月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

The fundamental thermodynamic bounds on finite models

Arxiv

0+阅读 · 2021年6月18日

Error bounds for Lanczos-based matrix function approximation

Arxiv

0+阅读 · 2021年6月17日

Generalized regression operator estimation for continuous time functional data processes with missing at random response

Arxiv

0+阅读 · 2021年6月17日

Model-assisted estimation through random forests in finite population sampling

Arxiv

0+阅读 · 2021年6月17日

Exponential Approximation of Band-limited Signals from Nonuniform Sampling

Arxiv

0+阅读 · 2021年6月16日

Author Clustering and Topic Estimation for Short Texts

Arxiv

0+阅读 · 2021年6月15日

Multi-sample estimation of centered log-ratio matrix in microbiome studies

Arxiv

0+阅读 · 2021年6月15日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Signal Processing and Piecewise Convex Estimation

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员