不同空间、时间和语法尺度的语文统计 (Language statistics at different spatial, temporal, and grammatical scales) - 专知论文

会员服务 ·

0

统计量 · 缩放 · 秩 · 多样性 · 相互独立的 ·

2022 年 7 月 2 日

Language statistics at different spatial, temporal, and grammatical scales

翻译：不同空间、时间和语法尺度的语文统计

Fernanda Sánchez-Puig,Rogelio Lozano-Aranda,Dante Pérez-Méndez,Ewan Colman,Alfredo J. Morales-Guzmán,Carlos Pineda,Carlos Gershenson

Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and grammatical (from monograms to pentagrams). We find that all three scales are relevant. However, the greatest changes come from variations in the grammatical scale. At the lowest grammatical scale (monograms), the rank diversity curves are most similar, independently on the values of other scales, languages, and countries. As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales, as well as on the language and country. We also study the statistics of Twitter-specific tokens: emojis, hashtags, and user mentions. These particular type of tokens show a sigmoid kind of behaviour as a rank diversity function. Our results are helpful to quantify aspects of language statistics that seem universal and what may lead to variations.

翻译：近几十年来,随着数据的出现,统计语言有了相当大的进步。这使得研究人员能够研究不同语言的统计特性随时间变化。在这项工作中,我们利用来自Twitter的数据来探索英语和西班牙语,以探讨不同尺度的等级多样性:时间(从3小时间隔到96小时间隔)、空间(从3公里到3000公里至3000公里的半径)和语法(从单数到五角形)以及语法(从单数到五角形),我们发现所有三个尺度都是相关的。然而,最大的变化来自语法尺度的变化。在最低的语法尺度(单数)中,等级多样性曲线与其他尺度、语言和国家的值非常相似。随着语法规模的扩大,等级多样性曲线因时间和空间尺度以及语言和国家而变化更大。我们还研究了Twitter特定标志的统计:mojis、标签和用户提到的。这些特定类型的标志显示了等级多样性功能。我们的成果有助于量化语言统计中似乎具有普遍性和可能导致变化的方面。

0

相关内容

统计量

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

SMYD3调控Wnt/β-catenin信号通路的分子机制及其在肝细胞癌中功能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于肝星状细胞自噬及调控通路探讨片仔癀抗肝纤维化的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

循环肿瘤细胞Stat3/Twist双信号通路交互作用对EMT编程的乳腺癌转移的调控与干预

国家自然科学基金

0+阅读 · 2014年12月31日

ATM基因在糖尿病大血管病变防治中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

HMGB1/RAGE信号通过改变Kupffer细胞亚型而发挥致肝纤维化作用的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Skp2-p27信号通路在卵巢早衰发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

AR/let-7及其下游分子对ER-AR+乳腺癌干细胞生长的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪因子adiponutrin在肥胖、胰岛素抵抗和2型糖尿病发病机制中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Robust Tests of Model Incompleteness in the Presence of Nuisance Parameters

Arxiv

0+阅读 · 2022年8月24日

Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context

Arxiv

0+阅读 · 2022年8月22日

Discovery and density estimation of latent confounders in Bayesian networks with evidence lower bound

Arxiv

0+阅读 · 2022年8月22日

Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment

Arxiv

0+阅读 · 2022年8月19日

Verifiable Differential Privacy For When The Curious Become Dishonest

Arxiv

0+阅读 · 2022年8月18日

Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

Arxiv

19+阅读 · 2022年1月5日

Advances in adversarial attacks and defenses in computer vision: A survey

Arxiv

22+阅读 · 2021年9月2日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

50+阅读 · 2021年1月6日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型时代的文档智能：综述

蜂窝通信是否是无人机与无人地面战车主宰战场的关键？

文档视觉问答简述

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Robust Tests of Model Incompleteness in the Presence of Nuisance Parameters

Arxiv

0+阅读 · 2022年8月24日

Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context

Arxiv

0+阅读 · 2022年8月22日

Discovery and density estimation of latent confounders in Bayesian networks with evidence lower bound

Arxiv

0+阅读 · 2022年8月22日

Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment

Arxiv

0+阅读 · 2022年8月19日

Verifiable Differential Privacy For When The Curious Become Dishonest

Arxiv

0+阅读 · 2022年8月18日

Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

Arxiv

19+阅读 · 2022年1月5日

Advances in adversarial attacks and defenses in computer vision: A survey

Arxiv

22+阅读 · 2021年9月2日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

50+阅读 · 2021年1月6日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

相关基金

SMYD3调控Wnt/β-catenin信号通路的分子机制及其在肝细胞癌中功能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于肝星状细胞自噬及调控通路探讨片仔癀抗肝纤维化的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

循环肿瘤细胞Stat3/Twist双信号通路交互作用对EMT编程的乳腺癌转移的调控与干预

国家自然科学基金

0+阅读 · 2014年12月31日

ATM基因在糖尿病大血管病变防治中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

HMGB1/RAGE信号通过改变Kupffer细胞亚型而发挥致肝纤维化作用的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Skp2-p27信号通路在卵巢早衰发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

AR/let-7及其下游分子对ER-AR+乳腺癌干细胞生长的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪因子adiponutrin在肥胖、胰岛素抵抗和2型糖尿病发病机制中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员