This paper studies community formation in OSS collaboration networks. While most current work examines the emergence of small-scale OSS projects, our approach draws on a large-scale historical dataset of 1.8 million GitHub users and their repository contributions. OSS collaborations are characterized by small groups of users that work closely together, leading to the presence of communities defined by short cycles in the underlying network structure. To understand the impact of this phenomenon, we apply a pre-processing step that accounts for the cyclic network structure by using Renewal-Nonbacktracking Random Walks (RNBRW) and the strength of pairwise collaborations before implementing the Louvain method to identify communities within the network. Equipping Louvain with RNBRW and the contribution strength provides a more assertive approach for detecting small-scale teams and reveals nontrivial differences in community detection such as users tendencies toward preferential attachment to more established collaboration communities. Using this method, we also identify key factors that affect community formation, including the effect of users location and primary programming language, which was determined using a comparative method of contribution activities. Overall, this paper offers several promising methodological insights for both open-source software experts and network scholars interested in studying team formation.
翻译:本文研究了开放源码软件合作网络中的社区形成情况。虽然目前大多数工作都审查了小规模开放源码软件项目的出现情况,但我们采用的方法是利用由180万GitHub用户组成的大型历史数据集及其储存库贡献。开放源码软件合作的特点是,用户小群密切合作,导致在基本网络结构中存在由短周期界定的社区。为了了解这一现象的影响,我们采用了一个预处理步骤,通过使用“更新-摆脱跟踪随机行走”(RNBRWW)和在实施卢万方法确定网络内社区之前的对称协作的力量来计算循环网络结构。 将卢万与卢万与RNBRWW和贡献力量结合起来,为发现小规模团队提供了更坚定的方法,并揭示了社区探测方面的非重大差异,例如用户倾向于偏向更固定的合作社区。我们采用这种方法,还确定了影响社区形成的关键因素,包括用户位置和主要编程语言的影响,这些作用是使用比较的贡献活动的方法确定的。总的来说,本文件为在研究开放源软件和有兴趣的团队形成过程中,提供了一些很有希望的方法见解。