Change in language use is driven by cultural forces; it is unclear whether that extends to programming languages. They are designed to be used by humans, but interaction with computer hardware rather than a human audience may limit opportunities for evolution of the lexicon of used terms. I tested this in R, an open source, mature and commonly used programming language for statistical computing. In corpus of 360,321 GitHub repositories published between 2014 and 2021, I extracted 168,857,044 function calls to act as n-grams of the R language. Over the eight-year period, R rapidly diversified and underwent substantial lexical change, driven by increasing popularity of the tidyverse collection of community packages. My results provide evidence that users can influence the evolution of programming languages, with patterns that match those observed in natural languages and reflect genetic evolution. R's evolution may have been driven by increased analytic complexity, driving new users to R, creating both selective pressure for an alternate lexicon and accompanying advective change. The speed and magnitude of this change may have flow-on consequences for the readability and continuity of analytic and scientific inquiries codified in R and similar languages.
翻译:语言使用的变化是由文化力量驱动的; 语言使用的变化是否延伸至编程语言; 语言的设计是人类使用的, 但与计算机硬件而不是人类受众的互动可能会限制用词词汇的演变机会。 我用一个开放源码、成熟和常用的统计计算用编程语言测试了这一点。 在2014年至2021年出版的360,321 GitHub 储存库中, 我提取了168, 857,044个功能, 以作为R语的n克。 在八年期间, R 迅速多样化, 并经历了巨大的词汇变化, 其驱动力是社区组合整洁的集越来越受欢迎。 我的结果表明, 用户可以影响编程语言的演变, 其模式与自然语言所观察到的模式相匹配, 并反映基因的演变。 R 演进的动力可能是分析复杂性增加, 将新用户推向R, 造成替代词汇的选择性压力, 以及伴随的刺激性变化。 这一变化的速度和规模可能给编篡R语和类似语言的分析性和科学调查的可读性和连续性带来大量后果。