In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades, providing comprehensive counts and trends for common metrics across packages, releases, authors, licenses, and other important metadata. We find that the historical growth of the ecosystem has been robust under all measures, with a compound annual growth rate of 29% for active packages, 28% for new releases, and 26% for active maintainers. As with many similar social systems, we find a number of highly right-skewed distributions with practical implications, including the distribution of releases per package, packages and releases per author or maintainer, package and maintainer dependency in-degree, and size per package and release. For example, the top five packages are imported by nearly 25% of all packages, and the top ten maintainers support packages that are imported by over half of all packages. We also highlight the dynamic nature of the ecosystem, recording both dramatic acceleration and notable deceleration in the growth of R. From a licensing perspective, we find a notable majority of packages are distributed under copyleft licensing or omit licensing information entirely. The data, methods, and calculations herein provide an anchor for public discourse and industry decisions related to R and CRAN, serving as a foundation for future research on the R software ecosystem and "data science" more broadly.
翻译:在此研究中,我们提出了一个关于R包生态系统的全面、纵向经验总结摘要,其中不仅包括CRAN,还包括生物导体和GitHub。我们分析了20年中超过25 000个包、150 000个释放和1 500万个档案。我们分析了20年中超过25 000个包、释放、作者、许可证和其他重要元数据的综合计算和趋势。我们发现,根据所有措施,生态系统的历史增长是稳健的,积极包的复合年增长率为29%,新释放为28%,积极维护者为26%。与许多类似的社会系统一样,我们发现了一些具有实际影响的高度右倾分布,包括每套、每套、每套、每套、每套、每套、每套、每套、每套、每套、每套、每套、每套、每套重要元重要元元元数据。例如,前五个包的进口量几乎占所有包的25%,前十套维护者支持包的进口量超过所有包的半数。我们还强调生态系统的动态性质,记录急剧加速和在R增长中的显著减速率。我们发现,从许可证发放基础和今后的大多数数据为目前用于许可证的版本。