Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio': The balance of retweets to organic messages. We find that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content. By the end of 2019, the contagion ratios for half of the top 30 languages, including English and Spanish, had reached above 1 -- the naive contagion threshold. In 2019, the top 5 languages with the highest average daily ratios were, in order, Thai (7.3), Hindi, Tamil, Urdu, and Catalan, while the bottom 5 were Russian, Swedish, Esperanto, Cebuano, and Finnish (0.26). Further, we show that over time, the contagion ratios for most common languages are growing more strongly than those of rare languages.
翻译:从2009年初至2019年底,我们从1 880亿条信息数据集中发现并探索Twitter上150多种语言每天的相对使用率。我们发现,8种语言占所有推特的80%,其中英文、日文、西班牙文和葡萄牙文占最主要位置。为了量化每一语言的社会传播,我们计算了“聚合比率”:回调与有机信息的平衡。我们发现,在推特上最常用的语言中,重新使用而不是分享新内容的趋势日益增长,尽管不是普遍,但这种趋势是越来越普遍。到2019年底,包括英语和西班牙语在内的前30种语言中,半数的传染比率已经超过1 -- -- 即天性传染临界点。2019年,每日平均比率最高的前5种语言的顺序是泰语(7.3)、印地语、泰米尔语、乌尔都语和加泰兰语,而最底层5种语言是俄语、瑞典语、埃斯佩兰托语、克布阿诺语和芬兰语(0.26)。此外,我们发现,随着时间的推移,大多数常见语言的传染比率比罕见语言的传染比率正在大幅增长。