A recent increase in data availability has allowed the possibility to perform different statistical linguistic studies. Here we use the Google Books Ngram dataset to analyze word flow among English, French, German, Italian, and Spanish. We study what we define as ``migrant words'', a type of loanwords that do not change their spelling. We quantify migrant words from one language to another for different decades, and notice that most migrant words can be aggregated in semantic fields and associated to historic events. We also study the statistical properties of accumulated migrant words and their rank dynamics. We propose a measure of use of migrant words that could be used as a proxy of cultural influence. Our methodology is not exempt of caveats, but our results are encouraging to promote further studies in this direction.
翻译:最近数据提供量的增加使得有可能进行不同的统计语言研究。 我们在这里使用谷歌书Ngram数据集来分析英语、法语、德语、意大利语和西班牙语之间的文字流。 我们研究我们定义的“移民词'”,这是一种不会改变其拼法的贷款词。 我们用不同几十年的时间量化移民的文字,并注意大多数移民词可以在语义领域和历史事件相关联的情况下汇总。 我们还研究累积移民词的统计性质及其等级动态。 我们建议使用移民词的量度,以作为文化影响力的代用。 我们的方法不能免除隐含词,但我们的结果令人鼓舞地促进了这方面的进一步研究。