Language is the primary medium through which human information is communicated and coordination is achieved. One of the most important language functions is to categorize the world so messages can be communicated through conversation. While we know a great deal about how human languages vary in their encoding of information within semantic domains such as color, sound, number, locomotion, time, space, human activities, gender, body parts and biology, little is known about the global structure of semantic information and its effect on human communication. Using large-scale computation, artificial intelligence techniques, and massive, parallel corpora across 15 subject areas--including religion, economics, medicine, entertainment, politics, and technology--in 999 languages, here we show substantial variation in the information and semantic density of languages and their consequences for human communication and coordination. In contrast to prior work, we demonstrate that higher density languages communicate information much more quickly relative to lower density languages. Then, using over 9,000 real-life conversations across 14 languages and 90,000 Wikipedia articles across 140 languages, we show that because there are more ways to discuss any given topic in denser languages, conversations and articles retrace and cycle over a narrower conceptual terrain. These results demonstrate an important source of variation across the human communicative channel, suggesting that the structure of language shapes the nature and texture of conversation, with important consequences for the behavior of groups, organizations, markets, and societies.
翻译:语言是传播人类信息并实现协调的首要媒介。 最重要的语言功能之一是对世界进行分类,以便通过对话传递信息。 虽然我们知道人类语言在颜色、声音、数量、运动、时间、空间、人类活动、性别、身体部分和生物学等语义领域的信息编码方面差异很大,但对语义信息的全球结构及其对人类通信的影响却知之甚少。 使用大规模计算、人工智能技术以及大规模、平行的15个主题领域 -- -- 包括宗教、经济、医学、娱乐、政治和技术 -- -- 999种语言 -- -- 的连锁公司,但我们在这里显示了语言的信息和语义密度及其对人类通信和协调的影响差异很大。 与以前的工作相比,我们证明密度较高的语言传播信息的速度比密度语言低的语言要快得多。 然后,使用超过9 000次的14种语言真实生活对话,以及超过90 000种140种语言的维基文章,我们展示了更多的方法来讨论任何特定主题,包括更稠密的语言、对话、文章、政治和技术 -- 999种语言 -- -- 999种语言,这里我们展示的是语言在信息和语言中的语义性密度密度密度密度密度密度密度密度密度密度密度密度密度密度密度密度和文章的密度密度密度密集结构结构中的巨大后果,以及狭式结构结构结构中,这些都显示了重要的通信和狭小的通信结构结构结构结构结构结构,这些结果,展示了重要、狭小的文本和狭狭狭狭型结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构。 这些结果。