One of the primary factors that encourage developers to contribute to open source software (OSS) projects is the collaborative nature of OSS development. However, the collaborative structure of these communities largely remains unclear, partly due to the enormous scale of data to be gathered, processed, and analyzed. In this work, we utilize the World Of Code dataset, which contains commit activity data for millions of OSS projects, to build collaboration networks for ten popular programming language ecosystems, containing in total over 290M commits across over 18M projects. We build a collaboration graph representation for each language ecosystem, having authors and projects as nodes, which enables various forms of social network analysis on the scale of language ecosystems. Moreover, we capture the information on the ecosystems' evolution by slicing each network into 30 historical snapshots. Additionally, we calculate multiple collaboration metrics that characterize the ecosystems' states. We make the resulting dataset publicly available, including the constructed graphs and the pipeline enabling the analysis of more ecosystems.
翻译:开源软件(OSS)项目中鼓励开发人员做出贡献的主要因素之一是OSS开发的协作性质。然而,这些社区的协作结构在很大程度上仍不清楚,部分原因是需要收集、处理和分析大量的数据。在这项工作中,我们利用了World Of Code数据集,该数据集包含数百万开源项目的提交活动数据,为十种流行的编程语言生态系统建立了协作网络,总共包括超过18M个项目的290M个提交。我们为每个语言生态系统构建了一个协作图表示,将作者和项目作为节点,从而可以在语言生态系统的规模上进行各种形式的社交网络分析。此外,我们通过将每个网络切片成30个历史快照来捕获生态系统演变的信息。另外,我们计算了多个协作指标,以描述生态系统的状态。我们公开了所得到的数据集,包括构建的图形和允许分析更多生态系统的管道。